Adam Retter
adam@evolvedbinary.com
The National Archives
2020-01-28
@adamretter
Pan-Archival Cataloguing
You look familiar!
-
Implementation Lead for DRI @ TNA
-
Digital Preservation and Archiving
-
Hardware and Software Architecture
-
2011 - 2014
-
-
Developer / Consultant
-
XML / XQuery / XSLT / Schema / RelaxNG
-
Scala / Java / C++
-
Concurrency and Scalability
-
-
Open Source
-
Facebook's RocksDB K/V database
-
Creator of FusionDB Multi-model database
-
Lead developer for eXist-db XML database
-
Nov 2019: Project Initiation
-
Depend on PROCat/ILDB:
-
SAR (System for Access Regulation)
-
DORIS (Document Ordering and Reader Information Service)
-
DRI (Digital Records Infrastructure)
-
Discovery
-
-
PROCat/ILDB depend on:
-
SAR
-
Transfers (e.g. Transfer Form / AA2 / Blue Form)
-
HMC (Historical Manuscripts Commission)
-
-
Predominantly by directly replicating data between databases
Discovery Phase - Integration
Discovery Phase - Integration
-
PROCat/ILDB
-
Authorative for Physical Records
-
Authorative for Series
-
Developed in 2000, before born-digital
-
Describes 15,000,000 records
-
-
DRI Catalogue
-
Authorative for Born-Digital Records
-
Semi-Authorative for Digitised Records
-
Built to work-around limitations of PROCat
-
Describes 3,100,000 records
-
See: The National Archives Digital Records Infrastructure Catalogue. Walpole, Rob. 2013
-
Discovery Phase - N Catalogues!
-
HMC (Historical Manuscripts Commission)
-
Authorative for Papers in other archives
-
Authorative for MDR (Manorial Documents Register)
-
Describes 900,000 records
-
Authorative for 20,000 Authority Files
-
-
MYC (Manage Your Collections)
-
Information about Records held by other archives
-
Mostly non-authorative
-
Authorative for some smaller Archives
-
Describes 10,000,000 records
-
Discovery Phase - N Catalogues!
-
UK Government Web Archive
-
Authorative for Snapshots of Gov Websites
-
Either "One-Off event" or Accumulation
-
Organised in Collection(s) which have PROCat Series Reference(s)
-
Records for over 5,000 domains (6 billion resources)
-
-
Digital Surrogate Systems
-
Docs Online
-
Describes 9,000,000 records
-
-
Record Copying
-
35,363 orders processed (2020-01-09)
-
-
Image Library
-
Consisting of 80,000 digital images
-
-
Discovery Phase - N Catalogues!
-
Proposing a Pan-Archival Catalogue
-
One Data Model
-
Medium Independent - e.g. Physical and Digital
-
Multiple Arrangements of Records
-
Holistic - e.g. Surrogates, Retained, etc.
-
Extensible for Description
-
-
One (Logical) Authorative System
-
Reduce duplication and replication of data
-
Reduce inconsistencies across systems
-
Each Record has a persistent and unique identifier
-
-
Catalogue Data Model
TNA Cataloguing Standard
-
ISAD(G) Derived Hierachical Arrangement
-
Description is inherited
-
-
Only 3 Possible (Mono-Hierarchical) Arrangements
TNA Cataloguing Standard
-
Attempted to Model Transfer of Records
DRI Catalogue Model
-
Derived from ISAG(G), PREMIS, and XIP
DRI Catalogue Model
-
Any (Mono-Hierarchical) Arrangement
DRI Catalogue Model
Choosing a New Model
-
Define a Common Vocabularly (for our Conceptual Model)
-
Try not to re-invent any wheels / Square peg vs round hole
-
Analyse Existing Models and Standards, key requirements:
-
Independent of Record Medium
-
Flexibility of Arrangement
-
Non/Mono/Poly-Hierarchical
-
Multiple Arrangements
-
-
Abstract Record, Concrete Manifestation(s)
-
Redaction
-
Surrogates
-
-
Provenance
-
Extensible / Open World
-
Existing Models and Standards
-
TNA-CS13 (TNA Cataloguing Standard 2013)
-
DRI Catalogue Model
-
BIA (Business Information Architecture)
-
EAD (Encoded Archival Description)
-
DCAT (Data Catalog Vocabulary)
-
FRBR (Functional Requirements for Bibliographic Records)
-
RDA (Resource Description and Access)
-
BIBFRAME
-
Europeana
-
RiC (Records in Context)
-
Matterhorn RDF Model
Top 3 Models
-
BIBFRAME Lite + Archive
-
Against - bibliographic/library centric domain
-
Against - single custom ontology
-
-
RiC-O
-
For - ICA implementation of RiC-CM
-
Against - single custom ontology
-
-
Matterhorn RDF
-
For - same goals as RiC-CM
-
Against - custom implementation
-
For - can be validated with SHACL
-
For - reuses existing ontologies (DC, PREMIS, PROV, RDA)
-
Project Omega, Next Steps...
-
Define URI for Records
-
RDF Unique Persistent Identifiers. Resolvable ???
-
Unlikely to replace CCR / GCR ???
-
-
Export data into Matterhorn RDF
-
ILDB
-
SAR
-
-
Implement (partial) PROCat Replacement
-
Backend is an API fronting the Database
-
Front-end only speaks to API
-
-
Document our Catalogue Model and Guidelines
-
Stretch goal: import some DRI Born Digital records
Ultimate Goal for World
Archives Domination:
A linked-data knowledge graph of the entire organisation's assets.
Questions?
Pan-archival Cataloguing
By Adam Retter
Pan-archival Cataloguing
Talk given for Project Omega at The National Archives - 28 January 2020 - Kew, London
- 1,864