One of the most common requests we've seen from the
CloudQuery Developer Community is the ability to flatten JSON source data into separate columns at their destination. Until now, we didn't really have a way to do this, even though source integrations have been exposing JSON schema information for a while.
Today, we're excited to announce the release of the JSON Flattener Transformation Integration, a new
CloudQuery Transformation Integration that enables you to flatten JSON source data into separate columns at your destination.
The problem #
Let's say your sync is producing this table in the destination:
account_id | region | details |
---|
012345678901 | us-west-1 | {"field_1": "2021-06-10 07:30:00", "field_2": "value_2", "field_3": true, "field_4": 4} |
012345678901 | ca-central-1 | {"field_1": "2021-07-25 12:40:00", "field_2": "value_4", "field_3": false, "field_4": 5} |
012345678901 | sa-east-1 | {"field_1": "2021-08-15 15:10:00", "field_2": "value_6", "field_3": null, "field_4": 6} |
Often, one would want the details
field to be flattened into separate columns, like this:
account_id | region | field_1 | field_2 | field_3 | field_4 |
---|
012345678901 | us-west-1 | 2021-06-10 07:30:00 | value_2 | true | 4 |
012345678901 | ca-central-1 | 2021-07-25 12:40:00 | value_4 | false | 5 |
012345678901 | sa-east-1 | 2021-08-15 15:10:00 | value_6 | null | 6 |
The solution #
The JSON Flattener Transformer Integration allows you to do just that! Without any additional configuration, it will flatten the details
field into separate columns, while maintaining the original details
field as well.
The transformed table will have the following columns:
account_id
(string)
region
(string)
details
(JSON)
field_1
(timestamp)
field_2
(string)
field_3
(boolean)
field_4
(integer)
One important caveat is that the
source integration must have exposed the JSON schema information for the
details
field. Although exposing JSON schema information has been supported in all CloudQuery Official plugins for the last 2 months, you may find that your plugin, or a particular table or field you're interested in does not expose this information yet. At the time of this writing, there's no way to provide this information from within the transformer plugin's spec.
A sample complete spec #
Use the following spec with the AWS source plugin to flatten the aws_ec2_instances
table.
kind: source
spec:
name: aws
path: cloudquery/aws
registry: cloudquery
version: "v28.1.0"
tables: ["aws_ec2_instances"]
destinations: ["postgresql"]
spec:
And a Postgres destination
kind: destination
spec:
name: "postgresql"
path: "cloudquery/postgresql"
registry: "cloudquery"
version: "v8.6.4"
write_mode: "overwrite-delete-stale"
transformers:
- jsonflattener # we add the JSON Flattener transformer here
spec:
connection_string: "..."
Don't forget to add the JSON Flattener integration to your spec.
kind: transformer
spec:
name: "jsonflattener"
registry: cloudquery
path: "cloudquery/jsonflattener"
version: "v1.0.0"
spec:
The JSON Flattener transformer integration is a powerful tool that can help solve many common data transformation challenges. By leveraging this integration, you can:
Simplify complex nested JSON structures, making them easier to query and analyze in your destination database.
Reduce storage requirements by combining flattening JSON objects and removing unneeded columns by leveraging the basic transformer.
Improve query performance by eliminating the need for complex JSON parsing operations in filtering, grouping, joining and ordering clauses.
You can find more information about the JSON Flattener transformer integration, including detailed configuration options and examples, in our
JSON Flattener transformer Integration documentation.
We're always looking to improve our tools and make them more useful for our community. Your feedback is invaluable in this process. If you have any questions, suggestions, or encounter any issues while using the JSON Flattener transformer plugin, please don't hesitate to reach out to us in the
CloudQuery Community. If you're ready to get started,
download the CloudQuery CLI today!