Alan Paxton, and Adam Retter
XML Prague
@ University of Economics, Prague
Benchmarking of XQuery
and XML Databases
Senior Engineer @Evolved Binary
Databases, Concurrency, and Performance
XML and XQuery
Contributor to eXist-db since 2021
XML Benchmarking
What does it look like now?
How does it compare?
To SQL Benchmarking
To (non-XML) NoSQL Benchmarking?
What are we proposing?
A new "framework"
What can it do for us?
XML Benchmarking Taxonomy
XML for document transformation
Efficient transformation in the large
XML for structured data storage
XML Document Stores
Documents stored unprocessed
XML Databases
Documents decomposed
Query databases
Query documents
The Growth of XML
XML Benchmarking History
X007 (2001)
The Initial XML Benchmarking Effort
An XML version of 007 benchmark
OODBMS benchmark
XML as an alternative to SQL databases
Pre XQuery standard
Different DBs had different query languages
Problems translating queries for each system
XMark - The XML Benchmark Project (2002)
An application-level benchmark
Characterising the broad performance of a whole system
Exercises as many features as possible
Scalable workload generation
xmlgen tool (C89)
Generate an auction database
Structured/linked elements
Text-heavy elements
Numbers of components scaled by benchmark size
Fixed set of queries
Allows consistent comparison of XML DBs/stores
Michigan Benchmark (2006)
A microbenchmark
Focused on isolating detailed aspects of performance
Use targeted as a developer tool
Compared 3 databases that support XML
Commercial XML DB
Commercial ORDBMS
Timber DB being developed at Michigan
Identified several performance issues for development
XQBench (2011)
A standardised environment for XML DB benchmarking
Aim for objective comparison of multiple systems
Configure preloaded workloads and queries
Invoke an experiment via Web API
A known workload and query set on a specific back end
E.g. run XMark at scale 10 on eXist-db
Record results of experiments
New workloads, queries, experiments
All experimental results recorded
XSLT Benchmarking
Extensible Stylesheet Language Transformations
Shares use cases with XQuery
XPath in common
Processing model
Take part(s) of input document(s)
Transform them
Create output document(s)
XT-Speedo (2014)
An XSLT benchmark
XSLT-focused workloads differ from XQuery workloads
Partly by user choice
Pragmatic collection of XSLT-focused documents, stylesheets
Stylesheet compilation is a factor
SQL Benchmark Frameworks
The Concept
Benchmark Framework or Testbed
OLTP-Bench (2013)
Testbed (their term) for SQL Databases
Driving relational DBMSs via standard interfaces
Tightly controlling the transaction mixture, request rate, and access distribution of the workload
Automatically gathering a rich set of performance and resource statistics
SQL Testbed Requirements
Synthetic and Real Data & Workloads
Mixed and Evolving Workloads
Change the rate, composition, and access distribution of workloads dynamically
Simulate real-life events and evolving workloads
Fine-Grained Rate Control
Control request rates with precision
Flexible Workload Generation
Open, closed, and semi-open loop systems
Transactional throughput Scalability
Without being restricted by clients
From OLTP-Bench
NoSQLBench (2020)
Testbed for NoSQL Database benchmarking
Pluggable architecture
Adapters for CQL, MongoDB, DynamoDB, etc.
Concurrent workload dispatch
Scriptable workloads (YAML)
Define operation templates
Define distribution of operation dispatch
Virtual Data Sets (VDS) for op field values
Rich set of functional generators
Value a function of the operation cycle number
Efficient generation and substitution into ops
XQuery and XML Databases
with NoSQLBench
NoSQLBench - XML Extensions
XML:DB API Adapter
Drive XML Database(s) from NoSQLBench
YAML scripted XML tests
VDS-based XML Generation
Flexible synthetic-XML generation in Java 21
Reproduced XMark's xmlgen
Canonical auction site example
About 50% slower than the C89 (ANSI C) version
Scope for improvement
Workload Configuration
# single shot operation to clear / reset the database contents
# elemental driver requires an endpoint parameter
schema: >
run driver=elemental
# different ops in the "write" schema occur on intervals, at the ratios declared by the ops
# elemental driver requires an endpoint parameter
write: >
run driver=elemental
Configuration Bindings
# Use example built in VDS functions
user_id: ToHashedUUID(); ToString() -> String
created_on: Uniform(1262304000,1577836800) -> long
gender: WeightedStrings('M:10;F:10;O:1')
city: Cities()
# We added our own function to select a paragraph of a Gutenberg book
text: WarAndPeace()
# Create a full name by joining first and last names
full_name: ListSized(FixedValue(2), FirstNames(), LastNames()); Join(' ')
XML:DB API Adapter op
# Insert a pseudo-random person record
op: |
xquery version "3.1";
let $record :=
<person id="{user_id}" seq="{seq_key}">
xmldb:store("/{collection}/testnb/beta", "{random_key}", $record)
XML Generation
Use the VDS part of NoSQLBench as a Java library
Declare bindings in code
Syntactic sugar around XML generation
Emit an element with value generated by the VDS function
Virtual Data Set Based Tool
LongFunction<String> fullNames = (LongFunction<String>) new Flow(
new ListSized(
new FixedValue(2),
new FirstNames(),
new LastNames()
), new Join<String>(" "));
LongFunction<String> education = new WeightedStrings(
"Other: 3, High School: 10, College: 10, Graduate School: 4");
element("name", fullNames);
element("education", education);
NoSQLBench XML Generation
Compose new VDS functions
LongFunction<String> emailNames = new AppendList<>(new ListFunctions(
new FirstNames(),
new FixedValue(" "),
new LastNames(),
new FixedValue(" mailto:"),
new LastNames(),
new FixedValue("@"),
new AlphaNumericString(4),
new FixedValue("."),
new WeightedStrings("com:10;org:8;")), "")
NoSQLBench XML Generation
Parameterise VDS functions from configuration
var openAuctions = configuration.element(Configuration.Elements.OpenAuctions);
this.auctionRef = new ListSizedStepped(
new Zipf(10, 1.75),
new HashRange(openAuctions.from(),;
NoSQLBench XML Generation
Create nested elements with lambdas
element("mailbox", this::mailbox);
protected void mailbox() {
var num = next(emailCount);
for (int i = 0; i < num; i++) {
element("mail", () -> {
element("from", emailNames);
element("to", emailNames);
element("date", Formatters.dateFmt.format(next(emailTime)));;
xmlgen vs xmlgen2
Scale factor 100
A large data set
Apple Macbook Pro, M1 Max CPU, 64MB RAM
xmlgen (C89 / ANSI C)
221s generation time
11,758MB XML file
xmlgen2 (Java 21, VDS library)
420s generation time
13,235MB XML file
Single threaded
Lots of room for performance improvements
Next Steps
Round trip test
Under NoSQLBench control (YAML script)
Generate XML
Load it into an XML:DB API database
Perform a set of simple benchmark queries
YAML Scripts for common benchmarks
NoSQLBench (our fork with XML:DB API driver):
Thank You
Modern Benchmarking of XQuery and XML Databases
By Adam Retter
Modern Benchmarking of XQuery and XML Databases
Presentation given for XML Prague - 8 June 2024 - University of Economics, Prague
- 234