Custom Pentaho Plugins
You Are Here
Java Libraries designed to Pentaho's Interface
Tried to develop in Scala (other JVM langs?)... Abandoned!
Who else here has built their own plugins?
Our Plugin Projects for:
Apache Jena -
RDF Graph Creation, Merging, and Serialization
SHACL Validation
Synchronisation -
Debugging -
Why Build Plugins?
You Are Here
Our default position is not to!
We needed functionality not offered by Pentaho
or... too complex to implement as N steps
Why not use a User Defined Java/JavaScript step?
Source Control
Duplicate code -> More maintenance!
Can't publish as Open Source (for greater RoI)
Unit and Integration Tests... CI?
We do use a few very small User Defined JavaScript steps!
Starting a new Step Plugin
You Are Here
Official Documentation
Developer_center/Create_step_plugins -
Enough to get you started
Lacking for any real purposes
Won't teach you Eclipse SWT (UI) Toolkit or Pentaho SWT!
Best Examples - Reading the code of Pentaho's Steps
We learnt a great deal by studying:
Anatomy of a Step Plugin
You Are Here
processRow is where your business
happens! It is equivalent to
User Defined Java/JavaScript Step
glues everything
StepData holds state from row-to-row
during the full transformation
Apache Jena Step Plugins
You Are Here
Create a Jena Model (RDF Graph) per-row
Maps fields in a row into RDF Properties
Combine Jena Models per-row
Merges one-or-more Jena Models within the same row
Group and Merge Jena Models per-column
Merges Models from consequtive rows within the same column
Serialize Jena Model per-column per-transformation
Merges all Jena Models (from a column), and writes a file
SHACL Validation
Transforming Relational to RDF
You Are Here
Synchronisation Step Plugins
Compare and Set Atomic per-row
Conditionally initalise or CaS an Atomic Value
Await Atomic per-row
Await for an Atomic Value and conditionally branch
Allows us to perform several steps as one Atomic Operation
Uses Java's Atomic values
Concurrency - Can be Tricky to get right!
Remember - Every Step in Pentaho is a distinct Thread!
Our Use Case - Get or Create (and Calculate) an Identifier
Synchronising Transformation Steps
You Are Here
Enhancing Pentaho Itself
We chose Pentaho because it is Open Source
We have a mandate to evaluate Open Source first
Pentaho (like all software) has Issues!
We have contributed fixes for:
Correct Date Time processing (pre 14th Sept 1752) #8006
See: -
Correctly detecting JAVA_HOME #7023
Documentation about how to compile a distribution #7841
Correcting UI rendering on macOS
Fixed failing tests on Windows #8007
(not) Enhancing Pentaho Itself
Only two of our most minor fixes have been incorporated
In reality - Pentaho is only technically Open Source
There is no Open Source Community
Contributing to Pentaho is (almost) Impossible!
We have sent high quality code with tests and 100% test suite pass
Developers are difficult to reach
Pull-Requests (or issues) can go unanswered "For Ever"
Pull-Requests can be closed without a working solution
Opening JIRA Tickets doesn't result in progress
Hitachi Sales / Support
We would consider a contract... if we get the fixes we need!
Sharing is Caring!
We are currently maintaining our fork of Pentaho Kettle 9.1
Not Practical for us
Updating is tricky... 9.2 is out now
Have to maintain skilled staff, GitHub, CI, etc.
Not Sustainable for the Future... What are our options?
...Would we choose Pentaho again?
Adam Retter
Director of Evolved Binary
(Consultant) Technical Architect for Project Omega,
The National Archives
Pentaho Plugins and Enhancements
By Adam Retter
Pentaho Plugins and Enhancements
Talk given for Pentaho London Users Group on behalf of Project Omega at The National Archives - 10th Feb 2022
- 1,449