MarkupUK @ Online
2020-06-25
Developer / Consultant
XQuery / XSLT / Schema / RelaxNG
Scala / Java / C++
Concurrency and Scalability
Creator of FusionDB multi-model database (4 yrs.)
Contributor to Facebook's RocksDB (4 yrs.)
Core contributor to eXist-db XML Database (14 yrs.)
Founder of EXQuery, and creator of RESTXQ
W3C XQuery WG Invited expert
Before XProc there was XPL
~2005 - https://www.w3.org/Submission/xpl/
Before XPL there was XQuery
~2001 - https://www.w3.org/TR/2001/WD-xquery-20010215/
Before XQuery there was XPath
~1999 - https://www.w3.org/1999/07/WD-xpath-19990709
Before XPath there was XSLT
~1998 - https://www.w3.org/TR/1998/WD-xsl-19980818
Back in 2006...
XPL looked nice but was Orbeon only
Apache Cocoon was ugly... and dying?
So... I wrote "pipelines" in XQuery (and XSLT)!
Many modules of small functions
XQuery -> XQuery -> XSLT -> XQuery -> XSLT -> XHTML / XUpdate
Now...
Anything XProc can do... I can do in XQuery!
Maybe I can even do more!
Why do I need XProc?
Is it's simplicity!
Data Model supports XML/JSON/Text/Binary
Mixing XML and XQuery in the same file
Creates dynamic documents
Depends on the Implementation!
HTTP Server fits well with the Web
Can produce XHTML/HTML5 dynamic websites
Much comes from EXPath/EXQuery/Vendor extensions!
Is there a proportionate relationsip between Covid-19 mortalities and Covid-19 relief funding?
Is more or less medical research conducted in areas with a higher Covid-19 mortality rate?
Today we will aggregate data from...
Various Web API using HTTP
Relational database using SQL
Graph (RDF) using SPARQL
JSON
XML Database
Binary files
We will then produce...
An HTML Page using XSLT
A PDF Report using XSL:FO
Medical Research Journal Articles
Statistics on Covid-19 Deaths
Statistics on Covid-19 Relief Funding
Mapping to/from State Abbreviations and State Names
National Library of Medicine
Pubmed dataset - keyword "covid"
Provides EUtils Web API
Available in XML
Limited to 10,000 records per-request
Our XQuery
Use XQuery extension module - EXPath HTTP
Call Search API
Call Fetch API n-times in pages of 10,000 results
Use XQuery extension module - XMLDB
Store XML documents into XML Database
See: download-pubmed-store-xml.xq
Centers for Disease Control and Prevention
Provisional COVID-19 Death Counts by Sex, Age, and State
https://data.cdc.gov/NCHS/Provisional-COVID-19-Death-Counts-by-Sex-Age-and-S/9bhg-hcku
Available in CSV
Our XQuery
Use XQuery extension module - EXPath HTTP
Call Download API
Parse the CSV into XDM Array
Cleanup CSV
Use XQuery extension module - SQL JDBC
Create SQL Table and Insert rows (MariaDB)
See: download-cdc-upload-sql.xq
Centers for Disease Control and Prevention
HHS (Health and Human Services) Provider Relief Fund
https://data.cdc.gov/Administrative/HHS-Provider-Relief-Fund/kh8y-3es6
Available in RDF
Our XQuery
Use XQuery extension module - EXPath HTTP
Call Download API
Use XQuery extension module - XMLDB
Store RDF document into XML Database *
See: download-hhs-store-rdf.xq
World Population Review
List of State Abbreviations
https://worldpopulationreview.com/states/state-abbreviations/
Available in JSON
Our XQuery
Use XQuery extension module - EXPath HTTP
Call Download API
Use XQuery extension module - XMLDB
Store JSON document into XML Database
See: download-states-store-json.xq
Data is in SQL (MariaDB - 1,416 rows)
1. Examine the Data
2. Use the XQuery extension module - SQL JDBC
3. Write some XQuery and SQL to get some stats
See: query-sql-dsas.xq
See: query-sql-deaths-total-percentage.xq
Data is in RDF (eXist-db/TDB - 1,471,085 triples)
1. Examine the Data
2. Use the XQuery extension module - SPARQL
3. Write some XQuery and SPARQL to get some stats
See: query-sparql-all-states.xq
See: query-sparql-state-payment-total.xq
eXist-db's RDF/SPARQL Module is not returning all the data :-(
SPARQL Protocol to the rescue!
Load data into Jena TDB
Use XQuery extension module - EXPath HTTP
Query Fuseki over HTTP
See: query-fuseki-all-states.xq
Write our own XQuery Module for Jena Fuseki
For replacing sparql:query
with fuseki:query
See: fuseki.xqm
See: query-fuseki-all-states-2.xq
Death stats have State Names / Funding stats have State Abbrevs.
We want to join the datasets by state
We will map Abbrevs. to Names
Data is in JSON (eXist-db - 1 Map with 55 key/value pairs)
1. Examine the Data
2. Use the XQuery function - fn:json-doc
3. Write some XQuery to lookup names from abbreviations
See: query-json-state.xq
Our XQuery
Uses all of the components so far
Calculates some percentages of total deaths
Calculates some percentages of total funding relief
Iterates through deaths by state
Joins funding relief payments by state ON state name
Produces two reports in XML
Each ordered differently for side-by-side comparison
See: answer1.xq
Our XQuery
Builds on the previous slide
We write an XSLT to transform the XML into HTML
See: answer1-to-html.xslt
Use XQuery extension module - Transform
If available, use fn:transform
instead!
Produces HTML
Could be run from the database's Web Server :-)
See: answer1-html.xq
No!
We can and have survived without it!
XQuery - best to adopt a functional compsosition style
Why even use XQuery?
I could have written this in: Assembler / Pascal / Java / C++ / blah!
Yes!
Cleaner Separation of Concerns
Better Orchestration of Tasks
Exploit Parrallelism (one day?)
But... I want it to run inside the database!
Avoid Serialization between steps
Exploit Indexes
Resource Scheduling
fusiondb.com