eXist-db Community Meetup
XML Prague 08/02/2018
eXist-db Core Dev (13 years!)
Consultant
Concurrency and Databases
Scala / Java / C++ / XQuery / XSLT
Open Source Hacker
NoSQL: eXist-db / RocksDB
CSV Validator / UTF-8 Validator / Shadoop
Many other smaller contributions...
W3C Invited Expert for XQuery WG
Author of the "eXist" book for O'Reilly
The last year of work at Evolved Binary
Concurrency in eXist-db
Multi-user Transactions
Sharded Caches
Memory barriers - i.e. Locks
Problems identified with Locking in eXist-db
Improvements/Solutions
Corruptions in eXist-db became unbearable
Evolved Binary start developing Granite (~2015)
R&D project to build a better Database for structured information
Started with eXist-db, and replacing its BTree storage
Transaction Isolation differences
eXist-db likely offers Repeatable Reads isolation level
Granite should offer at least Snapshot Isolation
eXist-db's Collection Cache not Transaction/Isolation safe
Goal: We need a better Collection Cache
Problem: Replacing the Collection Cache opened up many concurrency problems
Many operations are synchronized(collectionCache)
Performance effectively single-threaded for Collection ops
Introduced to avoid previous deadlocks and corruptions
Shared mutable state between transactions
Lack of transaction isolation
Fine for Repeatable Read in eXist-db (if you know)
Granite wants better Isolation support
Current approach restricts possible concurrency improvements
Unless you sacrifice consistency
Requirement: Transaction aware and Isolation safe
Two Levels
Transaction Local
Mutable
per-Transaction
Read-through to Global
Write version to Global on Commit
Global
Immutable
Versioned and GC'd
Remove synchronized(collectionCache) paths
for performance
Revealed several deadlock scenarios
Revealed further data corruption opportunities
Showed inconsistent design and use of Collection/Document locks
Inconsistent use of Locks
Inconsistent Lock Interleaving
Use of Incorrect Lock Modes - Read vs. Write
Lock Leaks
Accidental Lock Release
Insufficient Locking
Overzealous Locking
Correctness of Lock Implementations
Lack of Concurrency
One per in-memory Java Collection Object
should only be zero-or-one Java Object in-memory per database Collection
Guards both mutable Java Object state and collections.dbx entry
Implementation: org.exist.storage.lock.ReentrantReadWriteLock
Not actually Read/Write, really a Mutex!
" modified" EDU.oswego.cs.dl.util.concurrent.ReentrantLock
Exact Provenance is unclear
Correctness is unproven
One per in-memory Java Document Object
should only be zero-or-one Java Document in-memory per database Collection's Document
Guards both mutable Java Object state, and collections.dbx and dom.dbx entry
Implementation: org.exist.storage.lock.MultiReadReentrantLock
Similar to Java SE's ReentrantReadWriteLock?
Writer Biased
Allows Lock upgrading, i.e.: READ_LOCK -> WRITE_LOCK
Adapted from Apache Turbine JCS project
Exact Provenance is unclear
Correctness is unproven
Before solutions, we must understand the problems!
Centralises all locking operations
Reports all locking events to the Lock Table
Lock Identity
Now per-URI rather than per-Object
Impossible to have two in-memory Java Objects for the same database object
Can acquire in advance of creating the database object
Lock Table
Registerable Event Listeners
JMX Output
Snapshots and Traces
Acquired Locks
------------------------------------
/db/test
COLLECTION
READ_LOCK concurrencyTest-remove-12 (count=1),
concurrencyTest-remove-23 (count=1),
concurrencyTest-remove-21 (count=1),
concurrencyTest-remove-1 (count=1),
/db
COLLECTION
INTENTION_WRITE concurrencyTest-remove-0 (count=1)
/db/test/test1.xml
DOCUMENT
WRITE_LOCK concurrencyTest-remove-0 (count=1)
Attempting Locks
------------------------------------
/db/test
COLLECTION
WRITE_LOCK concurrencyTest-remove-0
Simply set locks.log to "trace" in log4j2.xml
2018-02-07 18:16:42,877 TRACE - Acquired COLLECTION#1133260707637130
(WRITE_LOCK) of /db/system/security/exist by main at 1133260707641681. count=2
2018-02-07 18:16:42,891 TRACE - Attempt COLLECTION#1133260707637130
(WRITE_LOCK) of /db/system/security/exist/groups by main at 1133260707642002
2018-02-07 18:16:42,891 TRACE - Acquired COLLECTION#1133260707637130
(WRITE_LOCK) of /db/system/security/exist/groups by main at 1133260707642140. count=2
2018-02-07 18:16:42,891 TRACE - Attempt DOCUMENT#1133260707647983
(WRITE_LOCK) of /db/system/security/exist/groups/eXide.xml by main at 1133260707648578
2018-02-07 18:16:42,891 TRACE - Acquired DOCUMENT#1133260707647983
(WRITE_LOCK) of /db/system/security/exist/groups/eXide.xml by main at 1133260707649404. count=1
2018-02-07 18:16:42,891 TRACE - Attempt COLLECTION#1133260707653300
(INTENTION_READ) of /db by main at 1133260707653769
2018-02-07 18:16:42,891 TRACE - Acquired COLLECTION#1133260707653300
(INTENTION_READ) of /db by main at 1133260707654041. count=1
2018-02-07 18:16:42,891 TRACE - Attempt COLLECTION#1133260707653300
(INTENTION_READ) of /db/system by main at 1133260707654349
2018-02-07 18:16:42,891 TRACE - Acquired COLLECTION#1133260707653300
(INTENTION_READ) of /db/system by main at 1133260707654480. count=1
Are eXist's lock implementations trustworthy?
We don't know the Provenance!
No known proofs of Correctness!
Likely, not used in other projects...
Replaced with Java SE's implementations
Fixed paths which performed lock upgrading
Collections/Documents: Java SE's ReentrantReadWriteLock
Collections now Reader/Writer (not Mutex)
Still mutex on Collection Cache and collections.dbx!
Some Java SE deadlock detection support, e.g. jconsole
Acquired with Lock#lockInterruptibly()
Replaced with Java SE's implementations
.dbx files: Java SE's ReentrantLock
Complex Relationship between BTree and BTreeCache
Existing functions often request the (overall) wrong lock mode
eXist's ReentrantReadWriteLock was (really) a mutex, so previously not a problem
Difficult to make Reader/Writer
Provenance and Correctness of Lock implementations is now well known and widely used
Reduces: Lock Leaks and Accidental Lock Releases
ARM constructs engage with syntax
e.g. try-with-resources
Lock(s) are always correctly released
We provide:
ManagedLock
ManagedCollectionLock
ManagedDocumentLock
LockedCollection
LockedDocument
Example, before Managed Locks:
Collection collection = null;
try {
collection = broker.openCollection("/db/x/y", LockMode.READ_LOCK);
DocumentImpl resource = null;
try {
resource = collection.getDocumentWithLock(broker, "doc1.xml",
LockMode.READ_LOCK);
// now do something with the document
} finally {
if (resource != null) {
resource.getUpdateLock().release(LockMode.READ_LOCK);
}
}
} finally {
if (collection != null) {
collection.release(LockMode.READ_LOCK)
}
}
Example, with Managed Locks:
try(final Collection collection = broker.openCollection("/db/x/y",
LockMode.READ_LOCK);
final LockedDocument resource = collection.getDocumentWithLock(broker,
"doc1.xml", LockMode.READ_LOCK)
) {
// now do something with the document
}
Deadlock Avoidance: Iterate objects in stable global order
Modified Collection's sub-Collections iterator
Previously unstable order - backed by a HashSet
Now backed by a LinkedHashSet, provides insertion order
Modified Collection's Documents iterator
Previously unstable order, backed by a TreeMap... ordered by Document ID!
Now backed by a LinkedHashMap, provides insertion order
Modified DefaultDocumentSet's iterator
Previously unstable order, backed by a Int2ObjectHashMap
Now backed by a LinkedHashSet, provides insertion order
Deadlock Avoidance: Always mix Collection/Document locks in same order
Mainly two patterns previously:
Symmetrical
i.e.: Lock Collection, Lock Document, Unlock Document, Unlock Collection
Easiest to provide managed constructs for e.g. Managed Locks
Asymmetrical
i.e. Lock Collection, Lock Document, Unlock Collection, Unlock Document
Most flexible
Offers best concurrency... can release Collection lock early!
Explicitly settled on the Asymmetrical pattern
Refactored eXist-db to exclusively use Asymmetrical pattern
Commented code to remind developers of Asymmetrical Pattern at each site of use
Documented the pattern
try(final Collection collection = broker.openCollection("/db/x/y",
LockMode.READ_LOCK)) {
// ...do something with *just* the Collection
try(final LockedDocument resource = collection.getDocumentWithLock(
broker, "doc1.xml", LockMode.READ_LOCK)) {
// ...do something with the Collection and Document
// NOTE: early release of Collection lock inline with Asymmetrical Locking scheme
collection.close();
// ...finally do something with *just* the Document
}
}
Reduces: Incorrect Lock Modes, Lock Leaks, Accidental Lock Releases and Insufficient Locking
Explicitly Documents (and enforces) locking contracts
We provide Java Annotations (for developers):
@EnsureLocked / @EnsureUnlocked
Lock mode must/not be held on a parameter or return object
@EnsureContainerLocked / @EnsureContainerUnlocked
Lock mode must/not be held on the object of a method call
Using Aspect Oriented Programming:
Can log violations to ensure-locking.log
Can throw an exception when a violation is detected
Designed to be used at test time (not production)
Example lock contract violation(s) log:
FAILED: Constraint to require lock mode WRITE_LOCK on Collection: /db/test
<- org.exist.storage.lock.EnsureLockingAspect.
enforceEnsureLockedParameters(EnsureLockingAspect.java:161
<- org.exist.storage.NativeBroker.removeCollection(NativeBroker.java:1665)
<- org.exist.dom.persistent.NodeTest.tearDown(NodeTest.java:239)
<- sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
FAILED: Constraint to require lock mode READ_LOCK on Document: /db/test/test.xml
<- org.exist.storage.lock.EnsureLockingAspect.
enforceEnsureLockedContainer(EnsureLockingAspect.java:303)
<- org.exist.dom.persistent.DocumentImpl.getDocId(DocumentImpl.java:197)
<- org.exist.indexing.range.RangeIndexWorker.removeCollection(RangeIndexWorker.java:363)
<- org.exist.indexing.IndexController.removeCollection(IndexController.java:207)
FAILED: Constraint to require lock mode READ_LOCK on Document: /db/test/test.xml
<- org.exist.storage.lock.EnsureLockingAspect.
enforceEnsureLockedContainer(EnsureLockingAspect.java:303)
<- org.exist.dom.persistent.DocumentImpl.getDocId(DocumentImpl.java:197)
<- org.exist.storage.structural.NativeStructuralIndexWorker.
getQNamesForDoc(NativeStructuralIndexWorker.java:540)
<- org.exist.storage.structural.NativeStructuralIndexWorker.
removeDocument(NativeStructuralIndexWorker.java:505)}
Attempt to find a Deadlock free Collection Locking scheme
Many options investigated!
Collection hierarchy in eXist-db is a tree!
Adopted a Hierarchical Locking Scheme
Granularity of Locks in a Shared Data Base - Gray et al. 1975
Lock from the tree's root node to the most granular node of interest
Locking a node in the tree implies locking descendants
Multiple lock modes: IS, S, IX, SIX, and X
Uses weaker intention locks are used at higher levels
Not deadlock free under all conditions
Our modified implementation: Granularity of Locks in a Shared Data Base
Mode 1: Multi-Writer / Multi-Reader
Better performance
Not deadlock free... unless user designs Collection hierarchy suitably
Mode 2: Single-Writer / Multi-Reader
Deadlock free
Restricts writes to any single Collection at once (likely happened previously)
Long running writes can block reads (likely happened previously)
The Default
Does not consider Documents!
Deadlocks can still occur between Collection and Documents
Could easily be extended to incorporate Documents
Previously: synchronized(collectionCache)
But... We have now addressed the locking issues!
Replaced eXist's Collection Cache:
Previously HashMap with LRU Policy
Adopted Caffeine from Ben Manes
Provides both size and age bounds
Now TinyFLU policy - more performant
ConcurrentHashMap like interface
Comprehensive Cahce Statistics available through JMX
Example Collection Cache JMX:
Many Improvements to eXist-db
Standard Java Locks
Improved Deadlock Avoidance
Managed Locks offer safety through syntax
Documented Locking Patterns
Corrected various lock use problems in the code base
Tools: EnsureLocked Annotations, LockTable tracing
Deadlocks Happen!
eXist-db cannot yet abort a Transaction without risking corruption
Provides a good foundation for future work...