aws
solutions
transformations

Tracking Changes in AWS Resource Capacity Over Time

Jonathan Sarig

Jonathan Sarig

Keeping track of your AWS resource capacity is crucial for maintaining optimal performance and cost-efficiency. DynamoDB, Elastic Compute Cloud (EC2), Elastic Block Store (EBS) volumes, and Lambda functions in AWS frequently undergo significant changes that affect performance and costs. This is where CloudQuery comes in handy. CloudQuery allows you to sync and query your AWS cloud assets with SQL, providing a simple and flexible way to track changes to your AWS resources and manage your cloud infrastructure effectively.
With CloudQuery, you can sync your AWS resources using our AWS Source Plugin to any destination available in our destination plugins. Our most popular destination is PostrgreSQL, so in the query examples below, we’ll use the PostgreSQL Destination Plugin.
To get started, download the CloudQuery CLI and set up your CloudQuery configuration file by following the AWS Source Plugin documentation. When writing your CloudQuery AWS configuration file, make sure that you add aws_dynamodb_tables, aws_ec2_ebs_volumes, and aws_lambda_functions to the list of tables being synced.
If you want to track historical usage trends of your AWS resources, be sure that you sync your resources in append mode in your database for at least 30 days to gather sufficient data and derive meaningful insights. By consistently monitoring these metrics, you can proactively manage your cloud infrastructure, optimize resource allocation, and ensure your applications run smoothly without unexpected interruptions.

How to Track Capacity Changes Over Time for DynamoDB Tables #

When discussing DynamoDB capacity, we refer to read and write capacity units, which you can configure manually or set to autoscale based on your application’s needs.
After syncing data from AWS, CloudQuery creates a table named aws_dynamodb_tables. This table holds crucial data regarding the capacity units for each DynamoDB table.
To analyze capacity changes, you can use the SQL Lag() function, along with partitioning by the unique resource identifier, Amazon Resource Names (ARN), and ordering by sync time. This approach allows you to observe changes at each sync point.
The Lag() function lets you look at previous rows with the previous being dependent on your ordering (in this case the _cq_sync_time column) and also specify groups/partitions which first “groups” the data and then looks only at rows that are of the same group (in this case rows with the same ARN).
SELECT 
    ARN,
    _cq_sync_time,
    provisioned_throughput->>'ReadCapacityUnits' AS current_read_capacity,
    provisioned_throughput->>'WriteCapacityUnits' AS current_write_capacity,
    LAG((provisioned_throughput->>'ReadCapacityUnits')::integer) OVER (PARTITION BY ARN ORDER BY _cq_sync_time) AS previous_read_capacity,
    LAG((provisioned_throughput->>'WriteCapacityUnits')::integer) OVER (PARTITION BY ARN ORDER BY _cq_sync_time) AS previous_write_capacity,
    ((provisioned_throughput->>'ReadCapacityUnits')::integer - LAG((provisioned_throughput->>'ReadCapacityUnits')::integer) OVER (PARTITION BY ARN ORDER BY _cq_sync_time)) AS read_capacity_change,
    ((provisioned_throughput->>'WriteCapacityUnits')::integer - LAG((provisioned_throughput->>'WriteCapacityUnits')::integer) OVER (PARTITION BY ARN ORDER BY _cq_sync_time)) AS write_capacity_change
FROM 
    aws_dynamodb_tables
ORDER BY 
    ARN, _cq_sync_time;
In this example, data is synced every 12 hours, which results in a table like this:
ARN_cq_sync_timecurrent_read_capacitycurrent_write_capacityprevious_read_capacityprevious_write_capacityread_capacity_changewrite_capacity_change
arn:aws:dynamodb:REGION:ACCOUNT
/example1
2024-06-03 09:00:00141715.015.0-1.02.0
arn:aws:dynamodb:REGION:ACCOUNT
/example1
2024-06-05 09:00:00141814.017.00.02.0
arn:aws:dynamodb:REGION:ACCOUNT
/example2
2024-06-03 09:00:00141316.020.01.0-1.0
arn:aws:dynamodb:REGION:ACCOUNT
/example2
2024-06-05 09:00:00171614.013.01.01.0
arn:aws:dynamodb:REGION:ACCOUNT
/example3
2024-06-03 09:00:00161615.015.01.01.0
arn:aws:dynamodb:REGION:ACCOUNT
/example3
2024-06-05 09:00:00202218.020.02.02.0
Visualizing this data can provide immediate insights into which of your tables have experienced an increase in capacity:
You can run the query below to identify which DynamoDB tables have grown the most in capacity over the past 30 days. This query is the same as the previous one, this query also utilizes the SQL Lag() function, but after that, you group the data by the ARN and aggregate the metrics calculated before, which then lets you see the combined/summed change over time.
WITH capacity_changes AS (
    SELECT 
        ARN,
        _cq_sync_time,
        (provisioned_throughput->>'ReadCapacityUnits')::integer AS current_read_capacity,
        (provisioned_throughput->>'WriteCapacityUnits')::integer AS current_write_capacity,
        LAG((provisioned_throughput->>'ReadCapacityUnits')::integer) OVER (PARTITION BY ARN ORDER BY _cq_sync_time) AS previous_read_capacity,
        LAG((provisioned_throughput->>'WriteCapacityUnits')::integer) OVER (PARTITION BY ARN ORDER BY _cq_sync_time) AS previous_write_capacity,
        ((provisioned_throughput->>'ReadCapacityUnits')::integer - LAG((provisioned_throughput->>'ReadCapacityUnits')::integer) OVER (PARTITION BY ARN ORDER BY _cq_sync_time)) AS read_capacity_change,
        ((provisioned_throughput->>'WriteCapacityUnits')::integer - LAG((provisioned_throughput->>'WriteCapacityUnits')::integer) OVER (PARTITION BY ARN ORDER BY _cq_sync_time)) AS write_capacity_change
    FROM 
        aws_dynamodb_tables
    WHERE 
        _cq_sync_time >= NOW() - INTERVAL '30 days'
)
SELECT 
    ARN,
    SUM(read_capacity_change) AS total_read_capacity_change,
    SUM(write_capacity_change) AS total_write_capacity_change,
    SUM(read_capacity_change + write_capacity_change) AS total_capacity_change
FROM 
    capacity_changes
GROUP BY 
    ARN
ORDER BY 
    total_capacity_change DESC;
The resulting table would look something like this:
arntotal_read_capacity_changetotal_write_capacity_changetotal_capacity_change
arn:aws:dynamodb:REGION:ACCOUNT
/EXAMPLE3
24.09.033.0
arn:aws:dynamodb:REGION:ACCOUNT
/EXAMPLE1
11.012.023.0
arn:aws:dynamodb:REGION:ACCOUNT
/EXAMPLE2
14.07.021.0
This table shows that the table, EXAMPLE3, has seen the largest increase in read and write capacity units combined over the past 30 days. This information is crucial for optimizing resource allocation and ensuring your DynamoDB tables are scaled appropriately.

Identifying AWS Lambda Functions with the Most Growth in Memory and Storage Over the Last 30 Days #

AWS Lambda is a serverless computing service that allows you to run code without provisioning or managing servers. You pay only for the compute time you consume. Lambda functions can be configured with different memory sizes and automatically provide a fixed amount of CPU power proportional to the memory configured. Additionally, you can configure ephemeral storage for temporary data. Tracking changes in memory and storage configurations is essential for performance tuning and cost management.
To analyze changes in memory and storage sizes for your Lambda functions, you can use a similar SQL query as you did above to analyze the capacity of your DynamoDB tables. The query below tracks changes over time using the SQL lag function and partitions the data by the ARN.
SELECT 
    arn,
    _cq_sync_time,
    (configuration->>'MemorySize')::integer AS current_memory_size,
    LAG((configuration->>'MemorySize')::integer) OVER (PARTITION BY arn ORDER BY _cq_sync_time) AS previous_memory_size,
    ((configuration->>'MemorySize')::integer - LAG((configuration->>'MemorySize')::integer) OVER (PARTITION BY arn ORDER BY _cq_sync_time)) AS memory_size_change,
    (configuration->'EphemeralStorage'->>'Size')::integer AS current_storage_size,
    LAG((configuration->'EphemeralStorage'->>'Size')::integer) OVER (PARTITION BY arn ORDER BY _cq_sync_time) AS previous_storage_size,
    ((configuration->'EphemeralStorage'->>'Size')::integer - LAG((configuration->'EphemeralStorage'->>'Size')::integer) OVER (PARTITION BY arn ORDER BY _cq_sync_time)) AS storage_size_change
FROM 
    aws_lambda_functions
ORDER BY 
    arn, _cq_sync_time;
This will result in a table like this:
arn_cq_sync_timecurrent_memory_sizecurrent_storage_sizeprevious_memory_sizeprevious_storage_sizememory_size_changestorage_size_change
arn:aws:lambda:REGION:ACCOUNT:function2024-06-01 09:00:00128512NaNNaNNaNNaN
arn:aws:lambda:REGION:ACCOUNT:function2024-06-02 09:00:00256528128.0512.0128.016.0
arn:aws:lambda:REGION:ACCOUNT:function2024-06-03 09:00:00224560256.0528.0-32.032.0
arn:aws:lambda:REGION:ACCOUNT:function2024-06-01 09:00:00128512NaNNaNNaNNaN
arn:aws:lambda:REGION:ACCOUNT:function2024-06-02 09:00:00256544128.0512.0128.032.0
arn:aws:lambda:REGION:ACCOUNT:function2024-06-03 09:00:00320560256.0544.064.016.0
Which can then be visualized like the DynamoDB table data
In the same way that you previously identified DynamoDB tables with the most growth in capacity, you can identify which Lambda functions grew the most in terms of memory and storage capacity. The query below finds this result and will be sorted by largest memory size change:
WITH memory_storage_changes AS (
    SELECT 
        arn,
        _cq_sync_time,
        (configuration->>'MemorySize')::integer AS current_memory_size,
        LAG((configuration->>'MemorySize')::integer) OVER (PARTITION BY arn ORDER BY _cq_sync_time) AS previous_memory_size,
        ((configuration->>'MemorySize')::integer - LAG((configuration->>'MemorySize')::integer) OVER (PARTITION BY arn ORDER BY _cq_sync_time)) AS memory_size_change,
        (configuration->'EphemeralStorage'->>'Size')::integer AS current_storage_size,
        LAG((configuration->'EphemeralStorage'->>'Size')::integer) OVER (PARTITION BY arn ORDER BY _cq_sync_time) AS previous_storage_size,
        ((configuration->'EphemeralStorage'->>'Size')::integer - LAG((configuration->'EphemeralStorage'->>'Size')::integer) OVER (PARTITION BY arn ORDER BY _cq_sync_time)) AS storage_size_change
    FROM 
        aws_lambda_functions
    WHERE 
        _cq_sync_time >= NOW() - INTERVAL '30 days'
)
SELECT 
    arn,
    SUM(memory_size_change) AS total_memory_size_change,
    SUM(storage_size_change) AS total_storage_size_change
FROM 
    memory_storage_changes
GROUP BY 
    arn
ORDER BY 
    total_memory_size_change DESC;
And will result in a table like this:
arntotal_memory_size_changetotal_storage_size_change
arn:aws:lambda:REGION:ACCOUNT/EXAMPLE1384512
arn:aws:lambda:REGION:ACCOUNT/EXAMPLE200
arn:aws:lambda:REGION:ACCOUNT/EXAMPLE300
From this table, you can identify which functions grew the most over the last 30 days in terms of memory and storage capacity.

How to analyze changes in size and throughput for AWS EBS volumes #

Amazon Elastic Block Store (EBS) provides persistent block storage for Amazon EC2 instances. EBS volumes are designed to offer high availability and durability, making them suitable for a wide range of workloads. When working with EBS volumes, two critical aspects to monitor are size and throughput:
  • Size: This refers to the allocated storage capacity of the EBS volume. Tracking changes in volume size is crucial for cost management and ensuring that your storage meets the demands of your applications.
  • Throughput: This measures the data transfer rate to and from the EBS volume. Monitoring throughput helps in optimizing performance, especially for applications that require high I/O operations.
By using CloudQuery, you can track these changes over time, allowing you to manage your EBS volumes more effectively and ensure optimal performance and cost-efficiency.
To analyze changes in size and throughput for EBS volumes, you can use a similar SQL query as you did for DynamoDB tables and Lambda functions. The query tracks changes over time using the lag function and partitions the data by the ARN.
SELECT 
    arn,
    _cq_sync_time,
    (size)::integer AS current_size,
    LAG((size)::integer) OVER (PARTITION BY arn ORDER BY _cq_sync_time) AS previous_size,
    ((size)::integer - LAG((size)::integer) OVER (PARTITION BY arn ORDER BY _cq_sync_time)) AS size_change,
    (throughput)::integer AS current_throughput,
    LAG((throughput)::integer) OVER (PARTITION BY arn ORDER BY _cq_sync_time) AS previous_throughput,
    ((throughput)::integer - LAG((throughput)::integer) OVER (PARTITION BY arn ORDER BY _cq_sync_time)) AS throughput_change
FROM 
    aws_ec2_ebs_volumes
ORDER BY 
    arn, _cq_sync_time;
This query results table similar to this
arn_cq_sync_timecurrent_sizecurrent_throughputprevious_sizeprevious_throughputsize_changethroughput_change
arn:aws:ec2:REGION:ACCOUNT/example12024-06-01 09:00:00100100NaNNaNNaNNaN
arn:aws:ec2:REGION:ACCOUNT/example12024-06-02 09:00:00125150100.0100.025.050.0
arn:aws:ec2:REGION:ACCOUNT/example12024-06-03 09:00:00150170125.0150.025.020.0
arn:aws:ec2:REGION:ACCOUNT/example22024-06-01 09:00:00100100NaNNaNNaNNaN
arn:aws:ec2:REGION:ACCOUNT/example22024-06-02 09:00:00100150100.0100.00.050.0
arn:aws:ec2:REGION:ACCOUNT/example22024-06-03 09:00:00110200100.0150.010.050.0
Visualized:
With a small modification, you can take one of the queries previously written and get the volumes that grew the most:
WITH size_throughput_changes AS (
    SELECT 
        arn,
        _cq_sync_time,
        (size)::integer AS current_size,
        LAG((size)::integer) OVER (PARTITION BY arn ORDER BY _cq_sync_time) AS previous_size,
        ((size)::integer - LAG((size)::integer) OVER (PARTITION BY arn ORDER BY _cq_sync_time)) AS size_change,
        (throughput)::integer AS current_throughput,
        LAG((throughput)::integer) OVER (PARTITION BY arn ORDER BY _cq_sync_time) AS previous_throughput,
        ((throughput)::integer - LAG((throughput)::integer) OVER (PARTITION BY arn ORDER BY _cq_sync_time)) AS throughput_change
    FROM 
        aws_ec2_ebs_volumes
    WHERE 
        _cq_sync_time >= NOW() - INTERVAL '30 days'
)
SELECT 
    arn,
    SUM(size_change) AS total_size_change,
    SUM(throughput_change) AS total_throughput_change
FROM 
    size_throughput_changes
GROUP BY 
    arn
ORDER BY 
    total_size_change DESC;
This results in this table where you can identify which volume grew and which volume didn't change.
arntotal_size_changetotal_throughput_change
arn:aws:ec2:REGION:ACCOUNT/EXAMPLE1675
arn:aws:ec2:REGION:ACCOUNT/EXAMPLE200
arn:aws:ec2:REGION:ACCOUNT/EXAMPLE300

Summary #

By leveraging CloudQuery to track changes in AWS resources like DynamoDB, Lambda functions, and EBS volumes, you can gain insights into your cloud infrastructure’s performance and cost dynamics. Start syncing your AWS resources with CloudQuery today and take control of your cloud infrastructure like never before.
Ready to dive deeper into your AWS resource capacity data? Try CloudQuery for free today and gain comprehensive insights into your AWS resources. Contact us or Join the CloudQuery Community to connect with other users and our engineering team, and explore the full potential of CloudQuery.
Jonathan Sarig

Written by Jonathan Sarig

Jonathan is a software engineer at CloudQuery with a particular interest in machine learning. He primarily works in golang but also has experience in (and a passion for) Rust.

Sync your cloud data now

Ingest your cloud data from hundreds of cloud and security tools to any destination.
No credit card required.