Adam Retter
adam@evolvedbinary.com
MarkupUK @ Online
2020-06-25
@adamretter
XQuery
is the
Plumber's Toolkit!
About Me
-
Developer / Consultant
-
XQuery / XSLT / Schema / RelaxNG
-
Scala / Java / C++
-
Concurrency and Scalability
-
-
Creator of FusionDB multi-model database (4 yrs.)
-
Contributor to Facebook's RocksDB (4 yrs.)
-
Core contributor to eXist-db XML Database (14 yrs.)
-
Founder of EXQuery, and creator of RESTXQ
-
W3C XQuery WG Invited expert
A long time ago...
-
Before XProc there was XPL
-
~2005 - https://www.w3.org/Submission/xpl/
-
-
Before XPL there was XQuery
-
~2001 - https://www.w3.org/TR/2001/WD-xquery-20010215/
-
-
Before XQuery there was XPath
-
~1999 - https://www.w3.org/1999/07/WD-xpath-19990709
-
-
Before XPath there was XSLT
-
~1998 - https://www.w3.org/TR/1998/WD-xsl-19980818
-
Do we need XProc?
-
Back in 2006...
-
XPL looked nice but was Orbeon only
-
Apache Cocoon was ugly... and dying?
-
So... I wrote "pipelines" in XQuery (and XSLT)!
-
Many modules of small functions
-
XQuery -> XQuery -> XSLT -> XQuery -> XSLT -> XHTML / XUpdate
-
-
-
Now...
-
Anything XProc can do... I can do in XQuery!
-
Maybe I can even do more!
-
Why do I need XProc?
-
-
The power of XQuery
-
Is it's simplicity!
-
Data Model supports XML/JSON/Text/Binary
-
Mixing XML and XQuery in the same file
-
Creates dynamic documents
-
-
Depends on the Implementation!
-
HTTP Server fits well with the Web
-
Can produce XHTML/HTML5 dynamic websites
-
-
-
Much comes from EXPath/EXQuery/Vendor extensions!
Today's XQuery Challenges...
-
Is there a proportionate relationsip between Covid-19 mortalities and Covid-19 relief funding?
-
Is more or less medical research conducted in areas with a higher Covid-19 mortality rate?
* I am not a Data Scientist!
-
Today we will aggregate data from...
-
Various Web API using HTTP
-
Relational database using SQL
-
Graph (RDF) using SPARQL
-
JSON
-
XML Database
-
Binary files
-
-
We will then produce...
-
An HTML Page using XSLT
-
A PDF Report using XSL:FO
-
XQuery can integrate all things!
-
Medical Research Journal Articles
-
Statistics on Covid-19 Deaths
-
Statistics on Covid-19 Relief Funding
-
Mapping to/from State Abbreviations and State Names
To answer the challenges...
we need data!
-
National Library of Medicine
-
Pubmed dataset - keyword "covid"
-
Provides EUtils Web API
-
Available in XML
-
Limited to 10,000 records per-request
-
-
Our XQuery
-
Use XQuery extension module - EXPath HTTP
-
Call Search API
-
Call Fetch API n-times in pages of 10,000 results
-
Use XQuery extension module - XMLDB
-
Store XML documents into XML Database
-
See:
download-pubmed-store-xml.xq
-
Medical Research Journal Articles
-
Centers for Disease Control and Prevention
-
Provisional COVID-19 Death Counts by Sex, Age, and State
-
https://data.cdc.gov/NCHS/Provisional-COVID-19-Death-Counts-by-Sex-Age-and-S/9bhg-hcku
-
Available in CSV
-
-
Our XQuery
-
Use XQuery extension module - EXPath HTTP
-
Call Download API
-
Parse the CSV into XDM Array
-
Cleanup CSV
-
Use XQuery extension module - SQL JDBC
-
Create SQL Table and Insert rows (MariaDB)
-
See:
download-cdc-upload-sql.xq
-
Statistics on Covid-19 Deaths
-
Centers for Disease Control and Prevention
-
HHS (Health and Human Services) Provider Relief Fund
-
https://data.cdc.gov/Administrative/HHS-Provider-Relief-Fund/kh8y-3es6
-
Available in RDF
-
-
Our XQuery
-
Use XQuery extension module - EXPath HTTP
-
Call Download API
-
Use XQuery extension module - XMLDB
-
Store RDF document into XML Database *
-
See:
download-hhs-store-rdf.xq
-
Statistics on Covid-19 Relief Funding
-
World Population Review
-
List of State Abbreviations
-
https://worldpopulationreview.com/states/state-abbreviations/
-
Available in JSON
-
-
Our XQuery
-
Use XQuery extension module - EXPath HTTP
-
Call Download API
-
Use XQuery extension module - XMLDB
-
Store JSON document into XML Database
-
See:
download-states-store-json.xq
-
Mapping to/from State Abbreviations and State Names
Now we have the data!
We have to query and join the different (SQL, RDF, JSON, XML)
datasets together...
But, to answer our questions:
-
Data is in SQL (MariaDB - 1,416 rows)
-
1. Examine the Data
-
2. Use the XQuery extension module - SQL JDBC
-
3. Write some XQuery and SQL to get some stats
-
See:
query-sql-dsas.xq
-
See:
query-sql-deaths-total-percentage.xq
-
Step 1. Query the Death Statistics
-
Data is in RDF (eXist-db/TDB - 1,471,085 triples)
-
1. Examine the Data
-
2. Use the XQuery extension module - SPARQL
-
3. Write some XQuery and SPARQL to get some stats
-
See:
query-sparql-all-states.xq
-
See:
query-sparql-state-payment-total.xq
-
Step 2. Query the Relief Funding Statistics
-
eXist-db's RDF/SPARQL Module is not returning all the data :-(
-
SPARQL Protocol to the rescue!
-
Load data into Jena TDB
-
Use XQuery extension module - EXPath HTTP
-
Query Fuseki over HTTP
-
See:
query-fuseki-all-states.xq
-
Write our own XQuery Module for Jena Fuseki
-
For replacing
sparql:query
withfuseki:query
-
See:
fuseki.xqm
-
-
See:
query-fuseki-all-states-2.xq
-
Step 2/2. Ooops! ...Switch to Jena Fuseki
-
Death stats have State Names / Funding stats have State Abbrevs.
-
We want to join the datasets by state
-
We will map Abbrevs. to Names
-
-
Data is in JSON (eXist-db - 1 Map with 55 key/value pairs)
-
1. Examine the Data
-
2. Use the XQuery function -
fn:json-doc
-
3. Write some XQuery to lookup names from abbreviations
-
See:
query-json-state.xq
-
Step 3. Map State Abbreviations to Names
-
Our XQuery
-
Uses all of the components so far
-
Calculates some percentages of total deaths
-
Calculates some percentages of total funding relief
-
Iterates through deaths by state
-
Joins funding relief payments by state ON state name
-
-
Produces two reports in XML
-
Each ordered differently for side-by-side comparison
-
-
See:
answer1.xq
-
Step 4. Join Datasets and Create Reports
-
Our XQuery
-
Builds on the previous slide
-
We write an XSLT to transform the XML into HTML
-
See:
answer1-to-html.xslt
-
-
Use XQuery extension module - Transform
-
If available, use
fn:transform
instead!
-
-
Produces HTML
-
Could be run from the database's Web Server :-)
-
-
See:
answer1-html.xq
-
Step 5. Create HTML Reports
That buttery smooth level of integration
was amazing!!!
So... do we even need XProc?
Do we need XProc?
-
No!
-
We can and have survived without it!
-
XQuery - best to adopt a functional compsosition style
-
Why even use XQuery?
-
I could have written this in: Assembler / Pascal / Java / C++ / blah!
-
-
-
Yes!
-
Cleaner Separation of Concerns
-
Better Orchestration of Tasks
-
Exploit Parrallelism (one day?)
-
But... I want it to run inside the database!
-
Avoid Serialization between steps
-
Exploit Indexes
-
Resource Scheduling
-
-
fusiondb.com
XQuery is the Plumber's Toolkit
By Adam Retter
XQuery is the Plumber's Toolkit
Webinar given for MarkupUK - 25 June 2020 - Online
- 2,318