You Are Here
Java Libraries designed to Pentaho's Interface
Tried to develop in Scala (other JVM langs?)... Abandoned!
Who else here has built their own plugins?
Our Plugin Projects for:
Apache Jena - https://github.com/nationalarchives/kettle-jena-plugins
RDF Graph Creation, Merging, and Serialization
SHACL Validation
Synchronisation - https://github.com/nationalarchives/kettle-atomic-plugins
XML - https://github.com/nationalarchives/kettle-xml-extra-plugins
Debugging - https://github.com/nationalarchives/kettle-debug-plugins
You Are Here
Our default position is not to!
We needed functionality not offered by Pentaho
or... too complex to implement as N steps
Why not use a User Defined Java/JavaScript step?
Source Control
Reuse
Duplicate code -> More maintenance!
Can't publish as Open Source (for greater RoI)
Testing
Unit and Integration Tests... CI?
We do use a few very small User Defined JavaScript steps!
You Are Here
Official Documentation
https://help.hitachivantara.com/Documentation/Pentaho/9.2/
Developer_center/Create_step_plugins
Enough to get you started
Lacking for any real purposes
Won't teach you Eclipse SWT (UI) Toolkit or Pentaho SWT!
Best Examples - Reading the code of Pentaho's Steps
https://github.com/pentaho/pentaho-kettle/tree/9.2.0.0-R/plugins
We learnt a great deal by studying:
You Are Here
processRow is where your business
happens! It is equivalent to
User Defined Java/JavaScript Step
StepMeta
glues everything
together
StepData holds state from row-to-row
during the full transformation
You Are Here
Create a Jena Model (RDF Graph) per-row
Maps fields in a row into RDF Properties
Combine Jena Models per-row
Merges one-or-more Jena Models within the same row
Group and Merge Jena Models per-column
Merges Models from consequtive rows within the same column
Serialize Jena Model per-column per-transformation
Merges all Jena Models (from a column), and writes a file
SHACL Validation
You Are Here
Demo...
Compare and Set Atomic per-row
Conditionally initalise or CaS an Atomic Value
Await Atomic per-row
Await for an Atomic Value and conditionally branch
Allows us to perform several steps as one Atomic Operation
Uses Java's Atomic values
Concurrency - Can be Tricky to get right!
Remember - Every Step in Pentaho is a distinct Thread!
Our Use Case - Get or Create (and Calculate) an Identifier
You Are Here
Demo...
We chose Pentaho because it is Open Source
We have a mandate to evaluate Open Source first
Pentaho (like all software) has Issues!
We have contributed fixes for:
Correct Date Time processing (pre 14th Sept 1752) #8006
See: https://blog.adamretter.org.uk/processing-historical-dates/
Correctly detecting JAVA_HOME #7023
Documentation about how to compile a distribution #7841
Correcting UI rendering on macOS
Fixed failing tests on Windows #8007
Only two of our most minor fixes have been incorporated
In reality - Pentaho is only technically Open Source
There is no Open Source Community
Contributing to Pentaho is (almost) Impossible!
We have sent high quality code with tests and 100% test suite pass
Developers are difficult to reach
Pull-Requests (or issues) can go unanswered "For Ever"
Pull-Requests can be closed without a working solution
Opening JIRA Tickets doesn't result in progress
Hitachi Sales / Support
We would consider a contract... if we get the fixes we need!
We are currently maintaining our fork of Pentaho Kettle 9.1
Not Practical for us
Updating is tricky... 9.2 is out now
Have to maintain skilled staff, GitHub, CI, etc.
Not Sustainable for the Future... What are our options?
...Would we choose Pentaho again?