

Custom Pentaho Plugins

You Are Here
-
Java Libraries designed to Pentaho's Interface
-
Tried to develop in Scala (other JVM langs?)... Abandoned!
-
-
Who else here has built their own plugins?
-
Our Plugin Projects for:
-
Apache Jena - https://github.com/nationalarchives/kettle-jena-plugins
-
RDF Graph Creation, Merging, and Serialization
-
SHACL Validation
-
-
Synchronisation - https://github.com/nationalarchives/kettle-atomic-plugins
-
XML - https://github.com/nationalarchives/kettle-xml-extra-plugins
-
Debugging - https://github.com/nationalarchives/kettle-debug-plugins
-

Why Build Plugins?

You Are Here
-
Our default position is not to!
-
We needed functionality not offered by Pentaho
-
or... too complex to implement as N steps
-
-
Why not use a User Defined Java/JavaScript step?
-
Source Control
-
Reuse
-
Duplicate code -> More maintenance!
-
Can't publish as Open Source (for greater RoI)
-
-
Testing
-
Unit and Integration Tests... CI?
-
-
We do use a few very small User Defined JavaScript steps!
-

Starting a new Step Plugin

You Are Here
-
Official Documentation
-
https://help.hitachivantara.com/Documentation/Pentaho/9.2/
Developer_center/Create_step_plugins -
Enough to get you started
-
Lacking for any real purposes
-
Won't teach you Eclipse SWT (UI) Toolkit or Pentaho SWT!
-
-
Best Examples - Reading the code of Pentaho's Steps
-
https://github.com/pentaho/pentaho-kettle/tree/9.2.0.0-R/plugins
-
We learnt a great deal by studying:
-

Anatomy of a Step Plugin

You Are Here


processRow is where your business
happens! It is equivalent to
User Defined Java/JavaScript Step
StepMeta
glues everything
together
StepData holds state from row-to-row
during the full transformation
Apache Jena Step Plugins

You Are Here
-
Create a Jena Model (RDF Graph) per-row
-
Maps fields in a row into RDF Properties
-
-
Combine Jena Models per-row
-
Merges one-or-more Jena Models within the same row
-
-
Group and Merge Jena Models per-column
-
Merges Models from consequtive rows within the same column
-
-
Serialize Jena Model per-column per-transformation
-
Merges all Jena Models (from a column), and writes a file
-
-
SHACL Validation

Transforming Relational to RDF

You Are Here

-
Demo...

Synchronisation Step Plugins

-
Compare and Set Atomic per-row
-
Conditionally initalise or CaS an Atomic Value
-
-
Await Atomic per-row
-
Await for an Atomic Value and conditionally branch
-

-
Allows us to perform several steps as one Atomic Operation
-
Uses Java's Atomic values
-
Concurrency - Can be Tricky to get right!
-
Remember - Every Step in Pentaho is a distinct Thread!
-
-
Our Use Case - Get or Create (and Calculate) an Identifier
Synchronising Transformation Steps

You Are Here

-
Demo...

Enhancing Pentaho Itself


-
We chose Pentaho because it is Open Source
-
We have a mandate to evaluate Open Source first
-
-
Pentaho (like all software) has Issues!
-
We have contributed fixes for:
-
Correct Date Time processing (pre 14th Sept 1752) #8006
See: https://blog.adamretter.org.uk/processing-historical-dates/ -
Correctly detecting JAVA_HOME #7023
-
Documentation about how to compile a distribution #7841
-
Correcting UI rendering on macOS
-
Fixed failing tests on Windows #8007
-
(not) Enhancing Pentaho Itself


-
Only two of our most minor fixes have been incorporated
-
In reality - Pentaho is only technically Open Source
-
There is no Open Source Community
-
Contributing to Pentaho is (almost) Impossible!
-
We have sent high quality code with tests and 100% test suite pass
-
Developers are difficult to reach
-
Pull-Requests (or issues) can go unanswered "For Ever"
-
Pull-Requests can be closed without a working solution
-
Opening JIRA Tickets doesn't result in progress
-
-
-
Hitachi Sales / Support
-
We would consider a contract... if we get the fixes we need!
-
Sharing is Caring!


-
We are currently maintaining our fork of Pentaho Kettle 9.1
-
Not Practical for us
-
Updating is tricky... 9.2 is out now
-
Have to maintain skilled staff, GitHub, CI, etc.
-
-
Not Sustainable for the Future... What are our options?
-
...Would we choose Pentaho again?
-
Questions?


Adam Retter
Director of Evolved Binary
(Consultant) Technical Architect for Project Omega,
The National Archives


@adamretter
Pentaho Plugins and Enhancements
By Adam Retter
Pentaho Plugins and Enhancements
Talk given for Pentaho London Users Group on behalf of Project Omega at The National Archives - 10th Feb 2022
- 1,491