XML Prague
2019-02-08
Personal Concerns
Open Source NXDs problems/limitations are not being addresed
Commercial NXDs are Expensive and not Open Source
New NoSQL (JSON) document db are out-innovating us
10 Years invested in Open Source NXD, unhappy with progress
Commercial Concerns from Customers
Help! Our Open Source NXD sometimes:
Crashes and Corrupts the database
Stops responding
Won't Scale with new servers/users
Reported by Users
Stability - Responsiveness / Deadlocks
Operational Reliability - Backup / Corruption / Fail-over
Missing Feature - Key/Value Metadata for Documents
Missing Feature - Triple/Graph linking for inter/cross-document references
Recognised by Developers
Correctness - Crash Recovery / Deadlock Avoidance / Deadlock Resolution / ACID
Performance - Reducing Contention / Avoiding System Maintenance mutex
Missing Features - Multi Model / Clustering
Project Health?
Issues - Rate of decay, i.e. Open vs Closed over time
Attracts new contributors?
Attracts new and varied users?
How do contributors pay their bills?
Contributor Constraints?
How long to get PRs reviewed?
Open to radical changes? Incremental vs Big-bang?
Other developers with time/knowledge to review PRs
License
Business friendly? CLA?
Reputation - Perceived or otherwise
Project "Granite"
Research and Development
Primary Focus on Correctness and Stability
Never become unresponsive
Never crash
Never lose data or corrupt the database
Should become Open Source
Should be appealing to Commercial enterprises
Open Source license choice(s) vs Revenue opportunities
Don't reinvent wheels!
Reuse - Faster time to market
Developers know eXist-db... Fork it!
Why?
We don't trust it's correctness
Old and Creaky? - (dbXML ~2001)
Improved with caching and journaling
Not Scalable - single-threaded read/write
Classic B+ Tree
Why not fix it?
Newer/better algorthms exist - B-link Tree, Bw Tree, etc.
We want a giant-leap, not an incremental improvement
1 2 3 4 5 6 7 8 9 10 |
<fruits> <fruit> <name>Apple</name> <colour>Green</colour> </fruit> <fruit> <name>Banana</name> <colour>Yellow</colour> </fruit> </fruits> |
fruits.xml: docId=6
We won't develop our own!
It's hard (to get correct)!
Other well-resourced projects available for reuse
We would rather focus on the larger DBMS
Choosing a suitable engine
Other Java Database B+ Trees are unsuitable
Examined - Apache Derby, H2, HSQLDB, and Neo4j
Few Open Source pure Storage Engines in Java
Discounted MapDB - known issues
Identified 3 possibilities:
LMDB - B-Tree written in C
ForestDB - HB+-Trie written in C++11
RocksDB - LSM written in C++14
Fork of Google's LevelDB
Performance Improvements for concurrency and I/O
Optimised for SSDs
Large Open Source community with commercial interests
e.g. Facebook, AirBnb, LinkedIn, NetFlix, Uber, Yahoo, etc.
Rich feature set
LSM-tree (Log Structured Merge Tree)
MVCC (Multi-Version Concurrency Control)
ACID
Built-in Atomicity and Durability
Offers primitives for building Consistency and Isolation
Column Families
We still:
Number the tree with DLN
Extract Symbols
Shred each node into a key/value pair
We do NOT use eXist's B+Tree or Variable Record Store
Instead, we use RocksDB Column Families
Each is an LSM-tree. Share a WAL
For each component
Persisent DOM Column Family - XML_DOM_STORE
Symbol Table Column Families - SYMBOL_STORE, NAME_INDEX, NAME_ID_INDEX
Collection Store Column Family - COLLECTION_STORE
eXist-db lacks strong transaction semantics
Txn - Log commit/abort just for Crash Recovery
Mostly just the Durability of ACID
Isolation level: ~Read Uncommitted
Our Transactions
RocksDB ensures Atomicity and Durability
We add Consistency and Isolation
Begin Transaction creates a db Snapshot
Each Transaction has an in-memory Write Batch
Write - only to in-memory Write Batch
Read - try in-memory write batch, fallback to Snapshot
Isolation level: >= Snapshot Isolation
Each public API call is a transaction
e.g. REST / WebDAV / RESTXQ / XML-RPC / XML:DB
auto-abort on exception
auto-commit when the call returns data
Each XQuery is a transaction
auto-abort if the query raises an error
auto-commit when the query completes
Begin Transaction creates a db Snapshot
XQuery 3.0 try/catch
try - begins a new sub-transaction
catch - auto-abort, the operations in the try body are undone
Sub-transactions can be nested, just like try/catch
Key/Value Model
Metadata for Documents and Collections
Searchable from XQuery
Online Backup
Lock free
Checkpoint Backup
Full Document Export
UUID Locators
Persist across backups and nodes
BLOB Store
Deduplication
Many Changes Upstreamed
eXist-db - Locking, BLOB Store, Concurrency, etc.
RocksDB - further Java APIs and improved JNI performance
With hindsight...
Wouldn't fork eXist-db...
Too much time spent discovering and fixing bugs
Start with a green field, add eXist-db compatible APIs
Much more work than anticipated
Today, new storage engines
FoundationDB / BadgerDB / FASTER
Benchmarking and Performance
JSON Native
Virtualised Collections
Distributed Cluster
Graph Model
XQuery compilation to native
CPU/GPU code