aws
solutions
transformations

Tracking Changes in AWS Resource Tagging Over Time

Jonathan Sarig

Jonathan Sarig

AWS resource tags are a way to add metadata to your AWS resources for a variety of purposes, such as Attribute Based Access Control (ABAC), Cloud Financial Management, and automation.
Keeping track of your AWS resource tags is crucial for effective management and cost allocation. AWS resources such as EC2 instances, Lambda functions, and S3 buckets can have tags that change over time, affecting your organization’s resource management and billing. With CloudQuery, you can export your cloud data to any database available in our destination plugins. This allows you to track changes to your AWS resource tags efficiently.

Getting Started with CloudQuery #

To get started, download the CloudQuery CLI and set up your CloudQuery configuration file by following the AWS Source Plugin documentation. When writing your CloudQuery AWS configuriation, make sure that you add the tables for the AWS resources you want to track. For this example, you will want to focus on aws_lambda_functions. However, you can use any AWS resource. Your schema will contain the _cq_sync_time, arn and tags columns for any AWS resource.
Ensure your CloudQuery configuration includes append mode for syncing data over time. This will allow you to gather historical data on tags.
To do that, make sure you include write_mode: append in your CloudQuery destination configuration and deterministic_cq_id: true in your AWS configuration. Here is an example using the PostgreSQL Destination Plugin:
kind: source
spec:
  name: "aws"
  path: "cloudquery/aws"
  version: "VERSION_SOURCE_AWS"
  tables: ["aws_lambda_functions"] 
  deterministic_cq_id: true
  destinations: ["postgresql"]
  spec:
---
kind: destination
spec:
  name: "postgresql"
  path: "cloudquery/postgresql"
  version: "VERSION_DESTINATION_POSTGRESQL"
  write_mode: "append"
  spec:
    connection_string: "<YOUR_POSTGRES_CONNECTION_STRING>"
Ensure your configuration uses the latest plugin versions, and includes all the settings relevant to you use case by visiting the AWS Source plugin docs and PostgreSQL destination plugin docs.

Tracking AWS Lambda Function Tags #

Once you’ve synced data from AWS, the table aws_lambda_functions will be created. This table includes a tags column that contains a JSON object representing the tags of each Lambda function. To analyze changes in these tags over time, we can use SQL’s Lag() window function in combination with a few PostgreSQL JSON functions.
WITH prev_tag AS (
    SELECT 
        arn,
        _cq_sync_time,
        tags,
        LAG(tags) OVER (PARTITION BY arn ORDER BY _cq_sync_time) AS previous_tags
    FROM 
        aws_lambda_functions
    ORDER BY 
        arn, _cq_sync_time
)
SELECT 
    _cq_sync_time,
    arn,
    tags, 
    previous_tags, 
    jsonb_build_object(
        'removed', 
        (
            SELECT jsonb_object_agg(key, value)
            FROM (
                SELECT j1.key, j1.value
                FROM jsonb_each(previous_tags) j1
                LEFT JOIN jsonb_each(tags) j2 ON j1.key = j2.key
                WHERE j2.key IS NULL
            ) AS removed
        ),
        'added', 
        (
            SELECT jsonb_object_agg(key, value)
            FROM (
                SELECT j1.key, j1.value
                FROM jsonb_each(tags) j1
                LEFT JOIN jsonb_each(previous_tags) j2 ON j1.key = j2.key
                WHERE j2.key IS NULL
            ) AS added
        ),
        'changed', 
        (
            SELECT jsonb_object_agg(key, value)
            FROM (
                SELECT j1.key, j1.value
                FROM jsonb_each(tags) j1
                JOIN jsonb_each(previous_tags) j2 ON j1.key = j2.key
                WHERE j1.value <> j2.value
            ) AS changed
        )
    ) AS tag_diff
FROM 
    prev_tag
WHERE previous_tags IS NOT NULL
ORDER BY arn, _cq_sync_time;
In this query, the main column is tag_diff, which will hold a JSON showing the change between the tags from the previous sync to the current for each row. For example, if there is no change in tags, you will see:
{
	"added": null, 
	"changed": null, 
	"removed": null
}
However, if your team starts tagging resources with an Owner tag, you will see:
{
	"added": {
			"Owner": "Bob"
			}, 
	"changed": null, 
	"removed": null
}
If there is a new owner for the function in the next sync, you will see:
{
	"added": null, 
	"changed": {
			"Owner": "Alice"
			}, 
	"removed": null
}
The changed property will hold the key and the new value of the tag.
Lastly, if your team decides to stop using the Owner tag in the next sync, you will see:
{
    "removed": {
        "Owner": "Alice"
    },
    "added": null,
    "changed": null
}
To simplify this query and make the JSON comparison reusable in other queries, you can create a PostgreSQL function that can be used in a repeatable process:
CREATE OR REPLACE FUNCTION jsonb_diff(tags JSONB, previous_tags JSONB)
RETURNS JSONB LANGUAGE plpgsql AS $$
BEGIN
    RETURN jsonb_build_object(
        'removed', 
        (
            SELECT jsonb_object_agg(key, value)
            FROM (
                SELECT j1.key, j1.value
                FROM jsonb_each(previous_tags) j1
                LEFT JOIN jsonb_each(tags) j2 ON j1.key = j2.key
                WHERE j2.key IS NULL
            ) AS removed
        ),
        'added', 
        (
            SELECT jsonb_object_agg(key, value)
            FROM (
                SELECT j1.key, j1.value
                FROM jsonb_each(tags) j1
                LEFT JOIN jsonb_each(previous_tags) j2 ON j1.key = j2.key
                WHERE j2.key IS NULL
            ) AS added
        ),
        'changed', 
        (
            SELECT jsonb_object_agg(key, value)
            FROM (
                SELECT j1.key, j1.value
                FROM jsonb_each(tags) j1
                JOIN jsonb_each(previous_tags) j2 ON j1.key = j2.key
                WHERE j1.value <> j2.value
            ) AS changed
        )
    );
END;
$$;
This would then result in the following query for Lambda functions:
WITH prev_tag AS (
    SELECT 
        arn,
        _cq_sync_time,
        tags,
        LAG(tags) OVER (PARTITION BY arn ORDER BY _cq_sync_time) AS previous_tags
    FROM 
        aws_lambda_functions
    ORDER BY 
        arn, _cq_sync_time
)
SELECT 
    _cq_sync_time,
    arn,
    tags, 
    previous_tags, 
    jsonb_diff(tags, previous_tags) AS tag_diff
FROM 
    prev_tag
WHERE previous_tags IS NOT NULL
ORDER BY arn, _cq_sync_time;
To adapt this query for tracking other AWS resources, simply replace aws_lambda_functions with any table available from our AWS Source Plugin. This will enable you to monitor tag changes over time for the selected resource.

Summary #

By leveraging CloudQuery to track changes in tags across all AWS resources, you can gain a deeper understanding of your cloud infrastructure’s organization and cost allocation. Whether you are using EC2 instances, Lambda functions, or S3 buckets, tracking tag changes over time allows you to maintain accurate resource management, ensure compliance, and optimize cost efficiency. Start syncing your AWS resources with CloudQuery today and enhance your resource management capabilities like never before.
Ready to learn more about your AWS resource tagging data? Try CloudQuery today and gain comprehensive insights into your AWS resources and their tags. Contact us or Join the CloudQuery Community to connect with other users and our engineering team, and explore the full potential of CloudQuery.
Jonathan Sarig

Written by Jonathan Sarig

Jonathan is a software engineer at CloudQuery with a particular interest in machine learning. He primarily works in golang but also has experience in (and a passion for) Rust.

Sync your cloud data now

Ingest your cloud data from hundreds of cloud and security tools to any destination.
No credit card required.