The National Archives
2020-01-28
Implementation Lead for DRI @ TNA
Digital Preservation and Archiving
Hardware and Software Architecture
2011 - 2014
Developer / Consultant
XML / XQuery / XSLT / Schema / RelaxNG
Scala / Java / C++
Concurrency and Scalability
Open Source
Facebook's RocksDB K/V database
Creator of FusionDB Multi-model database
Lead developer for eXist-db XML database
Depend on PROCat/ILDB:
SAR (System for Access Regulation)
DORIS (Document Ordering and Reader Information Service)
DRI (Digital Records Infrastructure)
Discovery
PROCat/ILDB depend on:
SAR
Transfers (e.g. Transfer Form / AA2 / Blue Form)
HMC (Historical Manuscripts Commission)
Predominantly by directly replicating data between databases
PROCat/ILDB
Authorative for Physical Records
Authorative for Series
Developed in 2000, before born-digital
Describes 15,000,000 records
DRI Catalogue
Authorative for Born-Digital Records
Semi-Authorative for Digitised Records
Built to work-around limitations of PROCat
Describes 3,100,000 records
See: The National Archives Digital Records Infrastructure Catalogue. Walpole, Rob. 2013
HMC (Historical Manuscripts Commission)
Authorative for Papers in other archives
Authorative for MDR (Manorial Documents Register)
Describes 900,000 records
Authorative for 20,000 Authority Files
MYC (Manage Your Collections)
Information about Records held by other archives
Mostly non-authorative
Authorative for some smaller Archives
Describes 10,000,000 records
UK Government Web Archive
Authorative for Snapshots of Gov Websites
Either "One-Off event" or Accumulation
Organised in Collection(s) which have PROCat Series Reference(s)
Records for over 5,000 domains (6 billion resources)
Digital Surrogate Systems
Docs Online
Describes 9,000,000 records
Record Copying
35,363 orders processed (2020-01-09)
Image Library
Consisting of 80,000 digital images
Proposing a Pan-Archival Catalogue
One Data Model
Medium Independent - e.g. Physical and Digital
Multiple Arrangements of Records
Holistic - e.g. Surrogates, Retained, etc.
Extensible for Description
One (Logical) Authorative System
Reduce duplication and replication of data
Reduce inconsistencies across systems
Each Record has a persistent and unique identifier
ISAD(G) Derived Hierachical Arrangement
Description is inherited
Only 3 Possible (Mono-Hierarchical) Arrangements
Attempted to Model Transfer of Records
Derived from ISAG(G), PREMIS, and XIP
Any (Mono-Hierarchical) Arrangement
Define a Common Vocabularly (for our Conceptual Model)
Try not to re-invent any wheels / Square peg vs round hole
Analyse Existing Models and Standards, key requirements:
Independent of Record Medium
Flexibility of Arrangement
Non/Mono/Poly-Hierarchical
Multiple Arrangements
Abstract Record, Concrete Manifestation(s)
Redaction
Surrogates
Provenance
Extensible / Open World
TNA-CS13 (TNA Cataloguing Standard 2013)
DRI Catalogue Model
BIA (Business Information Architecture)
EAD (Encoded Archival Description)
DCAT (Data Catalog Vocabulary)
FRBR (Functional Requirements for Bibliographic Records)
RDA (Resource Description and Access)
BIBFRAME
Europeana
RiC (Records in Context)
Matterhorn RDF Model
BIBFRAME Lite + Archive
Against - bibliographic/library centric domain
Against - single custom ontology
RiC-O
For - ICA implementation of RiC-CM
Against - single custom ontology
Matterhorn RDF
For - same goals as RiC-CM
Against - custom implementation
For - can be validated with SHACL
For - reuses existing ontologies (DC, PREMIS, PROV, RDA)
Define URI for Records
RDF Unique Persistent Identifiers. Resolvable ???
Unlikely to replace CCR / GCR ???
Export data into Matterhorn RDF
ILDB
SAR
Implement (partial) PROCat Replacement
Backend is an API fronting the Database
Front-end only speaks to API
Document our Catalogue Model and Guidelines
Stretch goal: import some DRI Born Digital records