CloudQuery News
Introducing The JSON Flattener Transformer Integration
One of the most common requests we've seen from the CloudQuery Developer Community is the ability to flatten JSON source data into separate columns at their destination. Until now, we didn't really have a way to do this, even though source integrations have been exposing JSON schema information for a while.
Today, we're excited to announce the release of the JSON Flattener Transformation Integration, a new CloudQuery Transformation Integration that enables you to flatten JSON source data into separate columns at your destination.
The problem #
Let's say your sync is producing this table in the destination:
account_id | region | details |
---|---|---|
012345678901 | us-west-1 | {"field_1": "2021-06-10 07:30:00", "field_2": "value_2", "field_3": true, "field_4": 4} |
012345678901 | ca-central-1 | {"field_1": "2021-07-25 12:40:00", "field_2": "value_4", "field_3": false, "field_4": 5} |
012345678901 | sa-east-1 | {"field_1": "2021-08-15 15:10:00", "field_2": "value_6", "field_3": null, "field_4": 6} |
Often, one would want the
details
field to be flattened into separate columns, like this:account_id | region | field_1 | field_2 | field_3 | field_4 |
---|---|---|---|---|---|
012345678901 | us-west-1 | 2021-06-10 07:30:00 | value_2 | true | 4 |
012345678901 | ca-central-1 | 2021-07-25 12:40:00 | value_4 | false | 5 |
012345678901 | sa-east-1 | 2021-08-15 15:10:00 | value_6 | null | 6 |
The solution #
The JSON Flattener Transformer Integration allows you to do just that! Without any additional configuration, it will flatten the
details
field into separate columns, while maintaining the original details
field as well.The transformed table will have the following columns:
account_id
(string)region
(string)details
(JSON)field_1
(timestamp)field_2
(string)field_3
(boolean)field_4
(integer)
One important caveat is that the source integration must have exposed the JSON schema information for the
details
field. Although exposing JSON schema information has been supported in all CloudQuery Official plugins for the last 2 months, you may find that your plugin, or a particular table or field you're interested in does not expose this information yet. At the time of this writing, there's no way to provide this information from within the transformer plugin's spec.A sample complete spec #
Use the following spec with the AWS source plugin to flatten the
aws_ec2_instances
table.kind: source
spec:
name: aws
path: cloudquery/aws
registry: cloudquery
version: "v28.1.0"
tables: ["aws_ec2_instances"]
destinations: ["postgresql"]
spec:
And a Postgres destination
kind: destination
spec:
name: "postgresql"
path: "cloudquery/postgresql"
registry: "cloudquery"
version: "v8.6.4"
write_mode: "overwrite-delete-stale"
transformers:
- jsonflattener # we add the JSON Flattener transformer here
spec:
connection_string: "..."
Don't forget to add the JSON Flattener integration to your spec.
kind: transformer
spec:
name: "jsonflattener"
registry: cloudquery
path: "cloudquery/jsonflattener"
version: "v1.0.0"
spec:
The JSON Flattener transformer integration is a powerful tool that can help solve many common data transformation challenges. By leveraging this integration, you can:
- Simplify complex nested JSON structures, making them easier to query and analyze in your destination database.
- Reduce storage requirements by combining flattening JSON objects and removing unneeded columns by leveraging the basic transformer.
- Improve query performance by eliminating the need for complex JSON parsing operations in filtering, grouping, joining and ordering clauses.
You can find more information about the JSON Flattener transformer integration, including detailed configuration options and examples, in our JSON Flattener transformer Integration documentation.
We're always looking to improve our tools and make them more useful for our community. Your feedback is invaluable in this process. If you have any questions, suggestions, or encounter any issues while using the JSON Flattener transformer plugin, please don't hesitate to reach out to us in the CloudQuery Community. If you're ready to get started, download the CloudQuery CLI today!
Written by Mariano Gappa
Mariano is a software engineer working at CloudQuery with 15 years of experience in the industry. His speciality is in improving performance and his work has reduced sync times and significantly improved CloudQuery's performance.