JSONata : The Swiss Army Knife of Kestra for JSON transformation

· 1392 words · 7 minute read

Photo by <a href="https://unsplash.com/@pf91_photography?utm_source=medium&amp;utm_medium=referral" target="_blank" rel="noopener">Patrick</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral" target="_blank" rel="noopener">Unsplash</a>

{JSON} is clearly one of the most frequently used and popular formats for storing or exchanging data through HTTP APIs, message brokers, or even object storage.

Manipulating JSON data has become part of any engineer’s daily routine. And, in today’s data-driven world, the ability to seamlessly transform and manipulate JSON data is essential for effective data integration and processing.

If you’ve ever worked with data integration workflows, you’ve probably already faced the need to find the right tool for the job!

Handling JSON data is not technically complicated, but it’s still tedious. In most cases, it means writing very specific code using your preferred programming or scripting language, such as Python, Java, or Bash. But, if the JSON transformation you need to perform turns out to be a bit complex, your code can quickly become overly complicated and result in something neither easy to read nor understand.

However, there’s a more efficient way to handle JSON transformations, especially within an orchestration platform like Kestra. And that’s where JSONata kicks in, the missing declarative language for engineers looking for an intuitive and efficient way to transform their JSON data.

In this blog post, I’d like to explore the various use cases for JSONata through practical examples and show you that it’s an ideal tool for transforming JSON data into Kestra workflows.

JSONata 🔗

JSONata (https://jsonata.org/) is a powerful and sophisticated query and transformation language for JSON allowing you to perform complex data manipulations, including filtering, mapping, and reducing.

JSONata — Logo

JSONata’s functionality is akin to what XSLT provides for XML and is inspired by the ‘location path’ semantics of XPath 3.1. It offers many built-in operators and functions to easily manipulate and combine data through a declarative syntax which is both expressive and compact.

Here are a few reasons why using JSON data :

  1. Flexibility: JSONata’s expressions allow for complex transformations that would be cumbersome with traditional scripting or programming languages.

  2. Simplicity: JSONata’s declarative approach means you can describe what you want to achieve without getting bogged down in the procedural details.

  3. Efficiency: JSONata is designed to work directly with JSON data structures, making transformations faster and more efficient than general-purpose languages.

Key Features of JSONata 🔗

  1. Path Expressions: Navigate through JSON structures easily using JSONata’s intuitive path expressions. e.g., account.orders.

  2. Functions: Built-in functions for common operations like string manipulation, arithmetic, and more, e.g., $substring(), $join(), $split(), $length().

  3. Lambda Expressions: Create reusable, anonymous functions for complex transformations, e.g., $map(), $reduce(), $filter().

  4. Aggregations: Perform operations like summing or averaging over arrays of data, e.g., $sum() , $min(), $max(), $average().

  5. Conditionals: Use if-else logic to create conditional transformations, i.e., using the ternary operator ?:.

If you want to try JSONata directly: JSONata Try.

Why Use JSONata with Kestra? 🔗

Kestra is an open-source unified orchestration platform designed to manage and automate complex business workflows. It excels in many areas, such as integrating various data sources, processing tasks, and data sinks.

Kestra — Logo

When it comes to transforming JSON data, integrating JSONata within Kestra can make these processes much more efficient and straightforward.

For example, JSONata can be leveraged part of your data integration workflows for:

  • Applying simple transformations, such as adding, modifying, or removing an attribute.

  • Extracting a specific subset of a complex JSON object.

  • Filtering some attributes based on their value.

  • Masking some attributes containing Personal Information Identifier (PII).

  • Migrating data source structure to the data sink structure or schema.

  • Joining multiple JSON objects.

JSONata in Action 🔗

Now, let’s explore JSONata expressions for performing common transformations, using practical examples.

For this, we will use the following sample JSON document:

{
    "account": {
        "firstName": "Han",
        "lastName": "Solo",
        "email": "han.solo@rebelalliance.org",
        "orders": [
            {
                "orderId": "order1",
                "timestamp": 1717430400, 
                "products": [
                    {
                    "productName": "Jedi Robe",
                    "price": 50,
                    "quantity": 2
                    },
                    {
                    "productName": "Droid Repair Kit",
                    "price": 100,
                    "quantity": 1
                    }
                ]
            },
            {
                "orderId": "order66",
                "timestamp": 1718601600,
                "products": [
                    {
                    "productName": "Lightsaber",
                    "price": 1000,
                    "quantity": 1
                    },
                    {
                    "productName": "Wookiee Bowcaster",
                    "price": 5000,
                    "quantity": 1
                    }
                ]
            }
        ]
    }
}

NOTE: Click on each link below each example to see the JSONata transformation in action.

Add a field 🔗

To add a static field, we can use one of the most useful JSONata’s functions: $merge(). This function merges an array of objects into a single object containing all the key/value pairs from each objects:

$merge([$, {'account': $merge([account, {'currency': 'credit'}])}])

Try it: https://try.jsonata.org/Q8HMdrqnM

Remove a field 🔗

To remove a field, we can use the $sift() function. This method returns an object that contains only the key/value pairs that satisfy a predicate function:

$merge([$, {'account': $sift($.account, function($v, $k) {$k != 'firstName' and $k != 'lastName'})}])

Try it: https://try.jsonata.org/mxHWcBSQ_

Merge Two fields 🔗

To combine two fields, we can use the concatenation operator &:

$merge([$, {'account': $merge([account, {'accountName': $lowercase(account.firstName & account.lastName)}])}])

Try it: https://try.jsonata.org/ataQwt23X

Filter Items 🔗

To filter items from an array, we can use a predicate expression, i.e.,[expr] where expr evaluates to a Boolean value.

$merge([$, {
    'account': $merge([account, {
    'orders': account.orders[products[][productName='Lightsaber']]
    }
])}])

Try it: https://try.jsonata.org/mavIPur-p

or the $filter() function:

$merge([$, {
    'account': $merge([account, {
    'orders': account.orders.$filter($, function($v, $i, $a) { 
        $v.$filter(products, function($v, $i, $a) {
            $v.productName = 'Lightsaber' 
        })
        })
}])}])

Mask a property value 🔗

To mask a value, built-in String functions like $replace() can be used:

$merge([$, {
    'account': $merge([account, 
    {'email': account.email.$replace(/@([^.]*)\./, "@*****.")}
    ])
}])

Try it: https://try.jsonata.org/C7Ekar8Oo

or by redeclaring the data structure (i.e., without using $merge):

{
    "account": {
    "accountName": account.accountName,
    "email": account.email.$replace(/@([^.]*)\./, "@*****."),
    "currency": account.currency,
    "orders": account.orders
    }
}

Aggregate Items 🔗

Numeric items can be aggregated using one of the built-in arithmetic methods such as $sum() :

    $sum(account.orders.products.(price * quantity))

Try it: https://try.jsonata.org/fdRA6nPQJ

Explode an Array 🔗

To transform or explode an Array, you can use the dot (.) operator which implements the for each and map functions.

In addition, the %. path operator can be used to access to the ‘parent’ of the current context value operation.

account.orders.{ 
    'order_id': orderId,
    'customer_name': $lowercase(%.firstName & %.lastName),
    'customer_email': %.email,
    'order_date': $fromMillis(timestamp),
    'order_total_price': $sum(products.(price * quantity)),
    'order_items': products.({`productName`: { 'qty': quantity, 'unit_price': price }})
    }

Try it: https://try.jsonata.org/U-rVFTl8j

Flatten JSON object 🔗

To perform more advance transformation, you can use the JSONata Functional Programming to define custom function.

The below example shows how to define a function to flatten any JSON object.

(
    /* Define the flatten function */
    $flatten := function($o, $prefix) { 
        $each($o, function($v, $k) {(
            $name := $join([$prefix,$k], '.');
            $type($v) = 'object' ? $flatten($v, $name) : 
            $type($v) = 'array' ? $merge($map($v, function($vv, $i, $a){ $flatten($vv, $name & '[' & $i & ']')})) :
            { $name: $v }
        )}) ~> $merge()
    };
    
    /* Flatten the object */
    $flatten(account)
)

Try it: https://try.jsonata.org/7WX_Oxh9B

Kestra for JSONata 🔗

Since the 0.18.0 release, Kestra has shipped with two built-in tasks for JSONata :

  • io.kestra.plugin.transform.jsonata.TransformItems — to transform a JSON (or ION) file containing one or multiple rows.

  • io.kestra.plugin.transform.jsonata.TransformValue — to transform a JSON object value.

Those two tasks are available through the plugin-transform-json.

For example, the Flow below consumes orders data from a Kafka source topic (using a real-time trigger), transforms the JSON data using JSONata TransformValue task, and finally sends the resulting data to a Kafka sink topic.

    id: realtime-kafka-pipeline
    namespace: sandbox
    tasks:
      - id: transform
        type: io.kestra.plugin.transform.jsonata.TransformValue
        from: "{{ trigger.value }}"
        expression: |
          account.orders.{ 
              'order_id': orderId,
              'customer_name': $lowercase(%.firstName & %.lastName),
              'customer_email': %.email,
              'order_date': $fromMillis(timestamp),
              'order_total_price': $sum(products.(price * quantity)),
              'order_items': products.({`productName`: { 'qty': quantity, 'unit_price': price }})
              }          
    
      - id: produce
        type: io.kestra.plugin.kafka.Produce
        topic: sink-user-orders
        from: "{{ outputs.transform.uri }}"
        properties:
          bootstrap.servers: localhost:9092
        serdeProperties:
          schema.registry.url: http://localhost:8085
        keySerializer: STRING
        valueSerializer: JSON
    
    triggers:
    - id: consume
      type: io.kestra.plugin.kafka.RealtimeTrigger
      topic: source-user-orders
      properties:
        bootstrap.servers: localhost:9092
      serdeProperties:
        schema.registry.url: http://localhost:8085
        keyDeserializer: STRING
        valueDeserializer: JSON
      groupId: kestra-order-consumer

Conclusion 🔗

JSONata is a powerful tool for JSON transformations, offering flexibility, simplicity, and efficiency through its declarative syntax. It excels in handling complex data manipulations with ease, reducing code complexity and improving readability.

When integrated with Kestra, JSONata enhances the platform’s ability to manage and automate complex business workflows involving JSON data. This combination streamlines the process of transforming JSON within your data integration tasks.

Ready to streamline your JSON transformations? Start exploring JSONata with Kestra today and see how it can simplify your data integration workflows. Visit JSONata and Kestra to get started.

If you like what we are doing, I kindly ask you to show your support by sharing/clapping this article and spreading the word📣. You can even show your support by giving a ⭐on Github! 😁

Thank you very much.

Follow-me on Twitter/X : @fhussonnois