Alan Paxton, and Adam Retter
tech@evolvedbinary.com
XML Prague
@ University of Economics, Prague
2024-06-08
github.com/evolvedbinary
Modern
Benchmarking of XQuery
and XML Databases
Alan
Paxton
alan@evolvedbinary.com
@alanpaxton
-
Senior Engineer @Evolved Binary
-
Databases, Concurrency, and Performance
-
Developer
-
Java
-
C++
-
XML and XQuery
-
-
Contributor to eXist-db since 2021
XML Benchmarking
-
What does it look like now?
-
Taxonomy
-
History
-
-
How does it compare?
-
To SQL Benchmarking
-
To (non-XML) NoSQL Benchmarking?
-
-
What are we proposing?
-
A new "framework"
-
What can it do for us?
-
XML Benchmarking Taxonomy
-
XML for document transformation
-
XSLT
-
Efficient transformation in the large
-
-
-
XML for structured data storage
-
XML Document Stores
-
Documents stored unprocessed
-
-
XML Databases
-
Documents decomposed
-
-
-
XQuery
-
Query databases
-
Query documents
-
The Growth of XML
XML Benchmarking History
X007 (2001)
-
The Initial XML Benchmarking Effort
-
An XML version of 007 benchmark
-
OODBMS benchmark
-
-
XML as an alternative to SQL databases
-
Pre XQuery standard
-
Different DBs had different query languages
-
Problems translating queries for each system
-
XMark - The XML Benchmark Project (2002)
-
An application-level benchmark
-
Characterising the broad performance of a whole system
-
Exercises as many features as possible
-
-
Scalable workload generation
-
xmlgen tool (C89)
-
Generate an auction database
-
Structured/linked elements
-
Text-heavy elements
-
Numbers of components scaled by benchmark size
-
-
-
Queries
-
Fixed set of queries
-
Allows consistent comparison of XML DBs/stores
-
Michigan Benchmark (2006)
-
A microbenchmark
-
Focused on isolating detailed aspects of performance
-
Use targeted as a developer tool
-
-
Compared 3 databases that support XML
-
Commercial XML DB
-
Commercial ORDBMS
-
Timber DB being developed at Michigan
-
Identified several performance issues for development
-
-
XQBench (2011)
-
A standardised environment for XML DB benchmarking
-
Aim for objective comparison of multiple systems
-
Configure preloaded workloads and queries
-
Invoke an experiment via Web API
-
A known workload and query set on a specific back end
-
E.g. run XMark at scale 10 on eXist-db
-
-
Record results of experiments
-
Extensible
-
New workloads, queries, experiments
-
-
All experimental results recorded
XSLT Benchmarking
-
Extensible Stylesheet Language Transformations
-
Shares use cases with XQuery
-
XPath in common
-
-
Processing model
-
Take part(s) of input document(s)
-
Transform them
-
Create output document(s)
-
-
-
XT-Speedo (2014)
-
An XSLT benchmark
-
XSLT-focused workloads differ from XQuery workloads
-
Partly by user choice
-
Pragmatic collection of XSLT-focused documents, stylesheets
-
-
Stylesheet compilation is a factor
-
SQL Benchmark Frameworks
-
The Concept
-
Benchmark Framework or Testbed
-
-
OLTP-Bench (2013)
-
Testbed (their term) for SQL Databases
-
-
Goals
-
Driving relational DBMSs via standard interfaces
-
Tightly controlling the transaction mixture, request rate, and access distribution of the workload
-
Automatically gathering a rich set of performance and resource statistics
-
SQL Testbed Requirements
-
Synthetic and Real Data & Workloads
-
Mixed and Evolving Workloads
-
Change the rate, composition, and access distribution of workloads dynamically
-
Simulate real-life events and evolving workloads
-
-
Fine-Grained Rate Control
-
Control request rates with precision
-
-
Flexible Workload Generation
-
Open, closed, and semi-open loop systems
-
-
Transactional throughput Scalability
-
Without being restricted by clients
-
From OLTP-Bench
NoSQLBench (2020)
-
Testbed for NoSQL Database benchmarking
-
Pluggable architecture
-
Adapters for CQL, MongoDB, DynamoDB, etc.
-
-
Concurrent workload dispatch
-
Scriptable workloads (YAML)
-
Define operation templates
-
Define distribution of operation dispatch
-
-
Virtual Data Sets (VDS) for op field values
-
Rich set of functional generators
-
Value a function of the operation cycle number
-
Efficient generation and substitution into ops
-
Repeatable
-
-
XQuery and XML Databases
Benchmarking
with NoSQLBench
NoSQLBench - XML Extensions
-
XML:DB API Adapter
-
Drive XML Database(s) from NoSQLBench
-
YAML scripted XML tests
-
-
-
VDS-based XML Generation
-
Flexible synthetic-XML generation in Java 21
-
Reproduced XMark's xmlgen
-
Canonical auction site example
-
Scalable
-
-
About 50% slower than the C89 (ANSI C) version
-
Scope for improvement
-
-
Workload Configuration
scenarios:
default:
# single shot operation to clear / reset the database contents
# elemental driver requires an endpoint parameter
schema: >
run driver=elemental
tags==block:"schema.*"
threads==1
cycles==UNDEF
endpoint=xmldb://localhost:3808
# different ops in the "write" schema occur on intervals, at the ratios declared by the ops
# elemental driver requires an endpoint parameter
write: >
run driver=elemental
tags==block:"write.*",
cycles===TEMPLATE(write-cycles,TEMPLATE(docscount,1000))
seq=interval
threads=auto
errors=timer,warn
endpoint=xmldb://localhost:3808
Configuration Bindings
# Use example built in VDS functions
user_id: ToHashedUUID(); ToString() -> String
created_on: Uniform(1262304000,1577836800) -> long
gender: WeightedStrings('M:10;F:10;O:1')
city: Cities()
# We added our own function to select a paragraph of a Gutenberg book
text: WarAndPeace()
# Create a full name by joining first and last names
full_name: ListSized(FixedValue(2), FirstNames(), LastNames()); Join(' ')
XML:DB API Adapter op
# Insert a pseudo-random person record
op: |
xquery version "3.1";
...
let $record :=
<person id="{user_id}" seq="{seq_key}">
<name>{full_name}</name>
<text>{alttext}</text>
</person>
return
xmldb:store("/{collection}/testnb/beta", "{random_key}", $record)
XML Generation
-
Use the VDS part of NoSQLBench as a Java library
-
Declare bindings in code
-
-
Syntactic sugar around XML generation
-
Emit an element with value generated by the VDS function
-
Virtual Data Set Based Tool
LongFunction<String> fullNames = (LongFunction<String>) new Flow(
new ListSized(
new FixedValue(2),
new FirstNames(),
new LastNames()
), new Join<String>(" "));
LongFunction<String> education = new WeightedStrings(
"Other: 3, High School: 10, College: 10, Graduate School: 4");
element("name", fullNames);
element("education", education);
NoSQLBench XML Generation
Compose new VDS functions
LongFunction<String> emailNames = new AppendList<>(new ListFunctions(
new FirstNames(),
new FixedValue(" "),
new LastNames(),
new FixedValue(" mailto:"),
new LastNames(),
new FixedValue("@"),
new AlphaNumericString(4),
new FixedValue("."),
new WeightedStrings("com:10;org:8;co.uk:1")), "")
NoSQLBench XML Generation
-
Parameterise VDS functions from configuration
var openAuctions = configuration.element(Configuration.Elements.OpenAuctions);
this.auctionRef = new ListSizedStepped(
new Zipf(10, 1.75),
new HashRange(openAuctions.from(), openAuctions.to()));
NoSQLBench XML Generation
-
Create nested elements with lambdas
element("mailbox", this::mailbox);
protected void mailbox() {
var num = next(emailCount);
for (int i = 0; i < num; i++) {
element("mail", () -> {
element("from", emailNames);
element("to", emailNames);
element("date", Formatters.dateFmt.format(next(emailTime)));
text.build();
});
}
}
xmlgen vs xmlgen2
-
Scale factor 100
-
A large data set
-
-
Apple Macbook Pro, M1 Max CPU, 64MB RAM
-
xmlgen (C89 / ANSI C)
-
221s generation time
-
11,758MB XML file
-
-
xmlgen2 (Java 21, VDS library)
-
420s generation time
-
13,235MB XML file
-
Single threaded
-
Lots of room for performance improvements
-
-
Next Steps
-
Round trip test
-
Under NoSQLBench control (YAML script)
-
Generate XML
-
Load it into an XML:DB API database
-
Perform a set of simple benchmark queries
-
-
-
YAML Scripts for common benchmarks
-
Reproducible
-
Distributable
-
Automatable
-
Questions?
-
NoSQLBench (our fork with XML:DB API driver):
https://github.com/evolvedbinary/nosqlbench
Thank You
Modern Benchmarking of XQuery and XML Databases
By Adam Retter
Modern Benchmarking of XQuery and XML Databases
Presentation given for XML Prague - 8 June 2024 - University of Economics, Prague
- 202