Adam Retter

adam@evolvedbinary.com
 

Declarative Amsterdam
@ Amsterdam Science Park
2023-11-02


@adamretter

A Possible

EXPath Pkg Version 2

About Me

  • Director and CTO of Evolved Binary

    • XML / XQuery / XSLT / RDF / SPARQL

    • Scala / Java / C++ / Rust

    • Concurrency and Scalability

  • Creator of FusionDB multi-model database

  • Contributor to Facebook's Meta's RocksDB (7 yrs.)

  • Core contributor to eXist-db XML Database (18 yrs.)

  • Founder of EXQuery, and creator of RESTXQ

  • Was a W3C XQuery WG Invited expert

  • Me: www.adamretter.org.uk

What is "Packaging"?

  • A loosely undefined set of related concepts!


  • A "package" is a container of one or more things

    • Might conform to a standard size, shape, or construction

    • Might ease the storage of things

    • Might ease the transportation of things

  • The act of "packaging" is that of containing the things


  • We will focus on Packaging of:

    • Software Code

    • Data

Photo by Jiawei Zhao on Unsplash

Why do we need Packages?

  • Software Reuse

    • Modern software architecture is modular

    • We are dependent on Software Libraries

      • Each may consist of many files

      • In-turn dependent on other libs. Ad infinitum

    • We may wish to publish our App/Libraries

      • May depend on many libraries

      • May consist of many files

  • Data Distribution

    • A data set may consist of many files

    • We may need to consume data sets

    • We may wish to publish data sets

Photo by Kelly Sikkema on Unsplash

Principles of Modularity

  1. "At implementation time each module and its inputs and outputs are well-defined, there is no confusion in the intended interface with other system modules."


  2. "At checkout time the integrity of the module is tested independently"


  3. "the system is maintained in modular fashion; system errors and deficiencies can be traced to specific system modules"

Photo by Tom Hermans on Unsplash

Designing Systems Programs, by Gauthier and Ponto (1970)

Packaging is an Ecosystem

  • The "package" itself is one small part of a larger system

    • Hopefully a standardised file (and metadata?) format and name

  • We also need to consider:

    • Consumption

    • Integration

    • Storage

    • Building a new Package (a.k.a. "Packaging")

    • Transportation

    • Publication

The Package Itself

  • Essential Properties

    • Detailed open specification that standardises its format

      • Internal - What goes where, and how?

      • Interface - What is available from the package, and how?

      • External - Package file fomat(s) and naming convention(s)

    • Standardised metadata describing the package

      • Not implementation specific!

    • Content Agnostic

  • Desirable Properties

    • Ease of storage / transportation

      • Single file containing both data and metadata; compressible

    • Easily inspectable

      • Metadata can be easily accessed and understood

    • Verifiable

Photo by Oli Zubenko on Unsplash

Integration

  • How do we use a package?

    • It's the things inside that we care about!

      • Do they need to be extracted from the package?

  • What about tools?

    • Do existing tools understand what a package is?

      • Do they even need to?

      • Could they be updated to support packages?

      • Can new tools be built to bridge between packages and existing tools?

    • Do we need to build new tools?

Photo by Kelly Sikkema on Unsplash

Building a Package

  • Output Format, i.e. the "Package"

    • Should be defined elsewhere in a "Package Specification" standard

  • Input Format

    • Unknown... Likely tool specific!

      • Needs to be clearly defined and documented for the users

  • Existing tools might be usable

    • e.g. Compose: mkdir, cp, tar, and gz

  • New tooling could simplify

    • Require certain inputs

    • Validate inputs and outputs

    • Single command

Transportation and Publishing

  • One file, or a data file and accompanying metadata file(s)

    • Amenable to std. operations, e.g: cp, scp, EMail, Upload to Dropbox, etc.

  • Publish to where?

    • Anywhere that accepts generic files

    • An environment adapted to the Package format

      • Registry - holds metadata and redirects to the package elsewhere

      • Repository - holds metadata and a copy of the package

      • Typically provide search facilities

      • May provide upload/download capabilities

      • Possibly accessible as a Humane Website

      • Possibly accessible by API - may also provide tools

Photo by David Trinks on Unsplash

Current Packaging for XML

EXPath Packaging System

  • EXPath Candidate Module - May 2012

    • https://expath.org/spec/pkg

    • An unfinished draft of a potential standard

    • Describes itself as a "packaging system" for components: "XSLT, XQuery, and XProc"

  • Some tools (xrepo and Java libs) provided

  • Covers:

    • The Package:

      • External file format

      • Layout of files within the package

      • Metadata (including dependencies and exported components)

    • Resolution of Namespace URI to (local) Components

    • On-disk repository layout

Photo by Jen Theodore on Unsplash

EXPath Packaging System

  • The Good:

    • We have something to discuss!

    • Reasonable basic Package metadata

    • Package is a single Zip file

    • Semantic Versioning 2.0.0 is used

    • People have used it "for real"...

      • Experience! i.e. We know where the pain is!

EXPath Packaging System

  • The Bad:

    • It is completely missing:

      • Consumption

      • Building

      • Transportation

      • Publication

    • Integration is weakly defined

    • Same URI can be reused for different components

    • No security

      • Q: Is that XQuery going to delete my collection(s)?

    • No checksums

    • On-disk repo package directories are named by the non-unique package abbrev

EXPath Packaging System

  • The Ugly:

    • Each package has two names: `name` and `abbrev`

    • Metadata lacks extensibility:

      • Can't add additional user oriented information

      • Can't add implementation specific metadata

      • Metadata for a component is only a URI and filename

    • Components are explicit in metadata, could be introspected instead?

    • Dependencies on processors?

EXPath Packaging System Implementations

  • Marklogic - abandoned prototype.

  • BaseX - Supported.

  • Saxon - abandoned prototype.

  • eXist-db

    • Undocumented Metadata Extensions to EXPath Packaging - repo.xml, exist.xml, and repo.xml

      • <license> / <copyright> / <type> / <target> / <prepare> / <finish>

    • Consumption / Publication: Public Application Repository (~100 Pkgs.)

    • Integration: autodeploy directory, XQuery functions, repository partly in database

    • Building: Ant, or Maven.

EXPath Packaging in eXist-db

Where do we go from here?

  • We know that we have EXPath Packaging

  • We know what we need/want from Packaging

    • A modern ecosystem that encompasses:

      • Consumption

      • Integration

      • Storage

      • Building

      • Transportation

      • Publication

  • EXPath Packaging doesn't yet meet our requirements

    • Can we fix it?

    • or, do we need to start again?

That sounds like a lot of hard work!

Can we take a lesson from this bird?

Can we take a lesson from this bird?

  • If our eggs (Packages) looked like someone else's eggs...

  • If we put our eggs in someone else's nest (Repository)...

  • Would they look after them for us?

Repurposing Another System

  • RPM / DEB / Homebrew

    • May be possible, but single version, and highly system oriented.

    • libsolv is interesting!

  • NPM

    • JavaScript only. Only one public repo. (Don't even) ask about their purported packaging standards.

  • RubyGems

    • Ruby only. Only one public repo. Packaging format is both good and bad.

  • Maven

    • JVM first, but extensible for any package format. Build centric approach.

  • Pip

    • Python only. Single version, has dependency resolution problems.

  • Conda

    • Designed for handling any language!

Photo by Steven Wright on Unsplash

Two Contenders

  • Maven

    • Consumption: From repositories

    • Integration: Major IDEs

    • Storage:The .m2 folder

    • Building:The mvn tool

    • Transportation / Publication: To repositories

  • Conda

    • Consumption: From repositories

    • Integration: Major IDEs

    • Storage: The .conda folder (or virtualenv)

    • Building: Conda Forge

    • Transportation / Publication: To repositories

Future Work...

  • A series of distinct standards and tools

  • Can we design a revised EXPath Packaging Standard (v2?) that can be implemented through reuse of other packaging systems?

    • Can we maintain compatibility with EXPath Packaging v1?

  • Can we successfully implement a revised EXPath Packaging Standard (v2?):

    • Using Maven?

    • Using Conda?

    • If so, are they interoperable?

Questions?

January 22 - 26, 2024 / London

Our new Training Course

  • Modular:

    • XML

    • XQuery 3.1

    • XSLT 2 and 3

    • XML Databses

  • In Person:

    • Instructor Led

    • Lot of Hands-on Exercises - Bring your laptop!

    • 3 to 5 days depending on your chosen Modules

Alexandra

Adam

Tomos