Custom Pentaho Plugins

You Are Here

Why Build Plugins?

You Are Here

  • Our default position is not to!

  • We needed functionality not offered by Pentaho

    • or... too complex to implement as N steps

  • Why not use a User Defined Java/JavaScript step?

    • Source Control

    • Reuse

      • Duplicate code -> More maintenance!

      • Can't publish as Open Source (for greater RoI)

    • Testing

      • Unit and Integration Tests... CI?

    • We do use a few very small User Defined JavaScript steps!

Starting a new Step Plugin

You Are Here

Anatomy of a Step Plugin

You Are Here

processRow is where your business
happens! It is equivalent to
User Defined Java/JavaScript Step

glues everything

StepData holds state from row-to-row
during the full transformation

Apache Jena Step Plugins

You Are Here

  1. Create a Jena Model (RDF Graph) per-row

    • Maps fields in a row into RDF Properties

  2. Combine Jena Models per-row

    • Merges one-or-more Jena Models within the same row

  3. Group and Merge Jena Models per-column

    • Merges Models from consequtive rows within the same column

  4. Serialize Jena Model per-column per-transformation

    • Merges all Jena Models (from a column), and writes a file

  5. SHACL Validation

Transforming Relational to RDF

You Are Here

  • Demo...

Synchronisation Step Plugins

  1. Compare and Set Atomic per-row

    • Conditionally initalise or CaS an Atomic Value

  2. Await Atomic per-row

    • Await for an Atomic Value and conditionally branch

  • Allows us to perform several steps as one Atomic Operation

    • Uses Java's Atomic values

    • Concurrency - Can be Tricky to get right!

    • Remember - Every Step in Pentaho is a distinct Thread!

  • Our Use Case - Get or Create (and Calculate) an Identifier

Synchronising Transformation Steps

You Are Here

  • Demo...

Enhancing Pentaho Itself

  • We chose Pentaho because it is Open Source

    • We have a mandate to evaluate Open Source first

  • Pentaho (like all software) has Issues!

  • We have contributed fixes for:

(not) Enhancing Pentaho Itself

  • Only two of our most minor fixes have been incorporated

  • In reality - Pentaho is only technically Open Source

    • There is no Open Source Community

    • Contributing to Pentaho is (almost) Impossible!

      • We have sent high quality code with tests and 100% test suite pass

      • Developers are difficult to reach

      • Pull-Requests (or issues) can go unanswered "For Ever"

      • Pull-Requests can be closed without a working solution

      • Opening JIRA Tickets doesn't result in progress

  • Hitachi Sales / Support

    • We would consider a contract... if we get the fixes we need!

Sharing is Caring!

  • We are currently maintaining our fork of Pentaho Kettle 9.1

    • Not Practical for us

    • Updating is tricky... 9.2 is out now

    • Have to maintain skilled staff, GitHub, CI, etc.

  • Not Sustainable for the Future... What are our options?

    • ...Would we choose Pentaho again?


Adam Retter
Director of Evolved Binary

(Consultant) Technical Architect for Project Omega,
                        The National Archives


Pentaho Plugins and Enhancements

By Adam Retter

Pentaho Plugins and Enhancements

Talk given for Pentaho London Users Group on behalf of Project Omega at The National Archives - 10th Feb 2022

  • 1,250