Declarative Amsterdam
@ Amsterdam Science Park
2023-11-02
Director and CTO of Evolved Binary
XML / XQuery / XSLT / RDF / SPARQL
Scala / Java / C++ / Rust
Concurrency and Scalability
Creator of FusionDB multi-model database
Contributor to Facebook's Meta's RocksDB (7 yrs.)
Core contributor to eXist-db XML Database (18 yrs.)
Founder of EXQuery, and creator of RESTXQ
Was a W3C XQuery WG Invited expert
A loosely undefined set of related concepts!
A "package" is a container of one or more things
Might conform to a standard size, shape, or construction
Might ease the storage of things
Might ease the transportation of things
The act of "packaging" is that of containing the things
We will focus on Packaging of:
Software Code
Data
Photo by Jiawei Zhao on Unsplash
Software Reuse
Modern software architecture is modular
We are dependent on Software Libraries
Each may consist of many files
In-turn dependent on other libs. Ad infinitum
We may wish to publish our App/Libraries
May depend on many libraries
May consist of many files
Data Distribution
A data set may consist of many files
We may need to consume data sets
We may wish to publish data sets
Photo by Kelly Sikkema on Unsplash
"At implementation time each module and its inputs and outputs are well-defined, there is no confusion in the intended interface with other system modules."
"At checkout time the integrity of the module is tested independently"
"the system is maintained in modular fashion; system errors and deficiencies can be traced to specific system modules"
Photo by Tom Hermans on Unsplash
Designing Systems Programs, by Gauthier and Ponto (1970)
The "package" itself is one small part of a larger system
Hopefully a standardised file (and metadata?) format and name
We also need to consider:
Consumption
Integration
Storage
Building a new Package (a.k.a. "Packaging")
Transportation
Publication
Photo by Vlad Tchompalov on Unsplash
Essential Properties
Detailed open specification that standardises its format
Internal - What goes where, and how?
Interface - What is available from the package, and how?
External - Package file fomat(s) and naming convention(s)
Standardised metadata describing the package
Not implementation specific!
Content Agnostic
Desirable Properties
Ease of storage / transportation
Single file containing both data and metadata; compressible
Easily inspectable
Metadata can be easily accessed and understood
Verifiable
Photo by Oli Zubenko on Unsplash
How do we use a package?
It's the things inside that we care about!
Do they need to be extracted from the package?
What about tools?
Do existing tools understand what a package is?
Do they even need to?
Could they be updated to support packages?
Can new tools be built to bridge between packages and existing tools?
Do we need to build new tools?
Photo by Kelly Sikkema on Unsplash
Output Format, i.e. the "Package"
Should be defined elsewhere in a "Package Specification" standard
Input Format
Unknown... Likely tool specific!
Needs to be clearly defined and documented for the users
Existing tools might be usable
e.g. Compose: mkdir, cp, tar, and gz
New tooling could simplify
Require certain inputs
Validate inputs and outputs
Single command
Photo by Josue Isai Ramos Figueroa on Unsplash
One file, or a data file and accompanying metadata file(s)
Amenable to std. operations, e.g: cp, scp, EMail, Upload to Dropbox, etc.
Publish to where?
Anywhere that accepts generic files
An environment adapted to the Package format
Registry - holds metadata and redirects to the package elsewhere
Repository - holds metadata and a copy of the package
Typically provide search facilities
May provide upload/download capabilities
Possibly accessible as a Humane Website
Possibly accessible by API - may also provide tools
Photo by David Trinks on Unsplash
Photo by Hush Naidoo Jade Photography on Unsplash
EXPath Candidate Module - May 2012
An unfinished draft of a potential standard
Describes itself as a "packaging system" for components: "XSLT, XQuery, and XProc"
Some tools (xrepo and Java libs) provided
Covers:
The Package:
External file format
Layout of files within the package
Metadata (including dependencies and exported components)
Resolution of Namespace URI to (local) Components
On-disk repository layout
Photo by Jen Theodore on Unsplash
The Good:
We have something to discuss!
Reasonable basic Package metadata
Package is a single Zip file
Semantic Versioning 2.0.0 is used
People have used it "for real"...
Experience! i.e. We know where the pain is!
Photo by Pawtography Perth on Unsplash
The Bad:
It is completely missing:
Consumption
Building
Transportation
Publication
Integration is weakly defined
Same URI can be reused for different components
No security
Q: Is that XQuery going to delete my collection(s)?
No checksums
On-disk repo package directories are named by the non-unique package abbrev
Photo by Priscilla Du Preez 🇨🇦 on Unsplash
The Ugly:
Each package has two names: `name` and `abbrev`
Metadata lacks extensibility:
Can't add additional user oriented information
Can't add implementation specific metadata
Metadata for a component is only a URI and filename
Components are explicit in metadata, could be introspected instead?
Dependencies on processors?
Photo by Sylwia Bartyzel on Unsplash
Marklogic - abandoned prototype.
BaseX - Supported.
Saxon - abandoned prototype.
eXist-db
Undocumented Metadata Extensions to EXPath Packaging - repo.xml
, exist.xml
, and repo.xml
<license>
/ <copyright>
/ <type>
/ <target>
/ <prepare>
/ <finish>
Consumption / Publication: Public Application Repository (~100 Pkgs.)
Integration: autodeploy
directory, XQuery functions, repository partly in database
Building: Ant, or Maven.
Photo by Karsten Winegeart on Unsplash
Photo by Mohamed Nohassi on Unsplash
We know that we have EXPath Packaging
We know what we need/want from Packaging
A modern ecosystem that encompasses:
Consumption
Integration
Storage
Building
Transportation
Publication
EXPath Packaging doesn't yet meet our requirements
Can we fix it?
or, do we need to start again?
Photo by David Kovalenko on Unsplash
Photo by Nathalie SPEHNER on Unsplash
Photo by Joshua J. Cotten on Unsplash
Photo by Joshua J. Cotten on Unsplash
If our eggs (Packages) looked like someone else's eggs...
If we put our eggs in someone else's nest (Repository)...
Would they look after them for us?
RPM / DEB / Homebrew
May be possible, but single version, and highly system oriented.
libsolv
is interesting!
NPM
JavaScript only. Only one public repo. (Don't even) ask about their purported packaging standards.
RubyGems
Ruby only. Only one public repo. Packaging format is both good and bad.
Maven
JVM first, but extensible for any package format. Build centric approach.
Pip
Python only. Single version, has dependency resolution problems.
Conda
Designed for handling any language!
Photo by Steven Wright on Unsplash
Maven
Consumption: From repositories
Integration: Major IDEs
Storage:The .m2
folder
Building:The mvn
tool
Transportation / Publication: To repositories
Conda
Consumption: From repositories
Integration: Major IDEs
Storage: The .conda
folder (or virtualenv)
Building: Conda Forge
Transportation / Publication: To repositories
Photo by Karsten Winegeart on Unsplash
A series of distinct standards and tools
Can we design a revised EXPath Packaging Standard (v2?) that can be implemented through reuse of other packaging systems?
Can we maintain compatibility with EXPath Packaging v1?
Can we successfully implement a revised EXPath Packaging Standard (v2?):
Using Maven?
Using Conda?
If so, are they interoperable?
Photo by Roberto Nickson on Unsplash
Modular:
XML
XQuery 3.1
XSLT 2 and 3
XML Databses
In Person:
Instructor Led
Lot of Hands-on Exercises - Bring your laptop!
3 to 5 days depending on your chosen Modules
Alexandra
Adam
Tomos