engineering

Introducing The JSON Flattener Transformer Integration

Mariano Gappa

Mariano Gappa

One of the most common requests we've seen from the CloudQuery Developer Community is the ability to flatten JSON source data into separate columns at their destination. Until now, we didn't really have a way to do this, even though source integrations have been exposing JSON schema information for a while.
Today, we're excited to announce the release of the JSON Flattener Transformation Integration, a new CloudQuery Transformation Integration that enables you to flatten JSON source data into separate columns at your destination.

The problem #

Let's say your sync is producing this table in the destination:
account_idregiondetails
012345678901us-west-1{"field_1": "2021-06-10 07:30:00", "field_2": "value_2", "field_3": true, "field_4": 4}
012345678901ca-central-1{"field_1": "2021-07-25 12:40:00", "field_2": "value_4", "field_3": false, "field_4": 5}
012345678901sa-east-1{"field_1": "2021-08-15 15:10:00", "field_2": "value_6", "field_3": null, "field_4": 6}
Often, one would want the details field to be flattened into separate columns, like this:
account_idregionfield_1field_2field_3field_4
012345678901us-west-12021-06-10 07:30:00value_2true4
012345678901ca-central-12021-07-25 12:40:00value_4false5
012345678901sa-east-12021-08-15 15:10:00value_6null6

The solution #

The JSON Flattener Transformer Integration allows you to do just that! Without any additional configuration, it will flatten the details field into separate columns, while maintaining the original details field as well.
The transformed table will have the following columns:
  • account_id (string)
  • region (string)
  • details (JSON)
  • field_1 (timestamp)
  • field_2 (string)
  • field_3 (boolean)
  • field_4 (integer)
One important caveat is that the source integration must have exposed the JSON schema information for the details field. Although exposing JSON schema information has been supported in all CloudQuery Official plugins for the last 2 months, you may find that your plugin, or a particular table or field you're interested in does not expose this information yet. At the time of this writing, there's no way to provide this information from within the transformer plugin's spec.

A sample complete spec #

Use the following spec with the AWS source plugin to flatten the aws_ec2_instances table.
kind: source
spec:
  name: aws
  path: cloudquery/aws
  registry: cloudquery
  version: "v28.1.0"
  tables: ["aws_ec2_instances"]
  destinations: ["postgresql"]
  spec:
And a Postgres destination
kind: destination
spec:
  name: "postgresql"
  path: "cloudquery/postgresql"
  registry: "cloudquery"
  version: "v8.6.4"
  write_mode: "overwrite-delete-stale"
  transformers:
    - jsonflattener # we add the JSON Flattener transformer here
  spec:
    connection_string: "..."
Don't forget to add the JSON Flattener integration to your spec.
kind: transformer
spec:
  name: "jsonflattener"
  registry: cloudquery
  path: "cloudquery/jsonflattener"
  version: "v1.0.0"
  spec:
The JSON Flattener transformer integration is a powerful tool that can help solve many common data transformation challenges. By leveraging this integration, you can:
  1. Simplify complex nested JSON structures, making them easier to query and analyze in your destination database.
  2. Reduce storage requirements by combining flattening JSON objects and removing unneeded columns by leveraging the basic transformer.
  3. Improve query performance by eliminating the need for complex JSON parsing operations in filtering, grouping, joining and ordering clauses.
You can find more information about the JSON Flattener transformer integration, including detailed configuration options and examples, in our JSON Flattener transformer Integration documentation.
We're always looking to improve our tools and make them more useful for our community. Your feedback is invaluable in this process. If you have any questions, suggestions, or encounter any issues while using the JSON Flattener transformer plugin, please don't hesitate to reach out to us in the CloudQuery Community. If you're ready to get started, download the CloudQuery CLI today!
Mariano Gappa

Written by Mariano Gappa

Mariano is a software engineer working at CloudQuery with 15 years of experience in the industry. His speciality is in improving performance and his work has reduced sync times and significantly improved CloudQuery's performance.

Turn cloud chaos into clarity

Find out how CloudQuery can help you get clarity from a chaotic cloud environment with a personalized conversation and demo.

Join our mailing list

Subscribe to our newsletter to make sure you don't miss any updates.

Legal

© 2024 CloudQuery, Inc. All rights reserved.

We use tracking cookies to understand how you use the product and help us improve it. Please accept cookies to help us improve. You can always opt out later via the link in the footer.