Tutorials
How to Manage AWS Resource and Usage Limits with CloudQuery and Service Quotas
Introduction #
Cloud services come with usage and resource limits that may cause adverse impact with cloud applications and infrastructure. Proper management of these usage and resource limits can prevent against availability issues when a resource cannot be created, a service is throttled, and helps with proactive management and troubleshooting of limit constraints that may impact active production applications and development environments.
In 2019, AWS introduced Service Quotas, a new service to help with managing these limits (now called quotas) from a central location.
For companies managing a handful of AWS accounts or to the scale of many accounts and even many Organizations, managing service limits is crucial to avoid scalability and inadvertent issues with limits. Examples such as platform infrastructure accounts that may contain core enterprise infrastructure, flagship AWS accounts that may contain critical business applications, and customer-facing accounts with many repeated infrastructure may all run into issues with service limits in AWS.
In this guide, we'll show how to use open source CloudQuery to augment existing capabilities from Amazon Web Services with additional data to better manage and monitor service limits in AWS.
Example Limit Scenarios #
Example scenarios we've seen where resource management can cause application issues or development challenges such as time spent troubleshooting (as of June 2023):
- Reaching the maximum number of S3 buckets in an account. The default limit of S3 buckets in an account is 100. Note, as of June 2023, S3 is not supported in Service Quotas and Service Quotas will only display the default limit in Console and via CLI even if the limit has been increased. We've reached out to AWS and there's a feature request for adding Service Quota support for S3.
- Reaching the maximum number of Lambda network interfaces. The default limit of network interfaces that Lambda creates for a VPC with functions attached is 250. Lambda creates a network interface for each combination of function subnets and security groups.
- Reaching KMS Key API Limits. KMS has API Limits that are adjustable such as CreateAlias and CreateKey request rates which by default are set to 5 per second. Some applications and enterprise architectures may require higher request rates.
- Reaching CloudFormation Stack Limits. Accounts by default have a default limit of 2000 Stacks. For teams with heavy infrastructure as code practices and in accounts with heavy infrastructure, teams could reach the maximum number of stacks.
- Reaching Identity and Access limits including managed policies per role. By default, roles can have a maximum number of 10 attached managed policies. For teams that use many managed policies to attach custom permissions to IAM principals, they may reach the limit of 10 managed policies and be unable to attach more policies or have failed attachments.
Other limits include EC2 with spot instances, networking, number of running dedicated hosts.
Managing Service Quotas #
We will go through a couple different methods for managing service limits. These options include using AWS-provided services, some of the current limitations of the services and coverage, as well as using CloudQuery to augment information provided by AWS.
One option provided by AWS is to utilize AWS Trusted Advisor (and AWS Support). AWS Trusted Advisor offers checks for service quotas with AWS Basic Support and AWS Developer Support Plans. AWS Business and Enterprise Support customers get all checks which include service quota checks. However, Trusted Advisor currently (as of June 2023) only supports 51 checks. Additionally, programmatic access to Trusted Advisor APIs (and Support) requires either a Business, Enterprise On-Ramp, or Enterprise Support Plan with AWS.
Another option is to integrate AWS Service Quotas with CloudWatch for alerting. However, these integrations only work with Service Quotas that support CloudWatch Alarms.
With CloudQuery, we'll demonstrate another option: how to use CloudQuery to manage and monitor resource utilization and service quota usage. We'll also use CloudQuery to sync information from Trusted Advisor, Support, and Service Quotas along with infrastructure data in AWS for a better understanding of service limits and utilization in AWS.
Walkthrough #
We'll focus on the following tables and services:
To start with the sync, check out the AWS Configuration Guide. We'll start with the following
aws.yml
source configuration file. We recommend adjusting your configuration file and tables synced to meet your specific needs.kind: source
spec:
# Source spec section
name: aws
path: cloudquery/aws
registry: cloudquery
version: "v30.2.0"
tables:
- aws_servicequotas_services
- aws_support_trusted_advisor_checks
- aws_support_cases
destinations: ["postgresql"]
spec:
# AWS Spec section described below
regions:
- us-east-1
accounts:
- id: "account1"
local_profile: "account1-profile"
aws_debug: false
Reference Queries #
We will first look for all Service Quotas that are adjustable. Service Quotas can either be a hard limit (non-adjustable) or a soft limit (adjustable).
SELECT * FROM aws_servicequotas_quotas
WHERE adjustable = TRUE;
Now, if you have either a Business, Enterprise On-Ramp, or Enterprise Support Plan with AWS, we can check the Trusted Advisor Service Limit checks programmatically. CloudQuery uses AWS APIs, so without those plans and the Trusted Advisor API enabled, this table will not be populated.
SELECT * FROM aws_support_trusted_advisor_checks
WHERE category='service_limits';
Now let's check our CloudFormation limits and augment that with statistics on our CloudFormation usage. For this query, we'll need to sync additional CloudFormation resources including
aws_cloudformation_stacks
.SELECT quotas.quota_code, quotas.service_code, quotas.quota_name,
quotas.value, quotas.region, quotas.account_id,
quotas.adjustable, quotas.global_quota, quotas.quota_arn,
COUNT(stacks.stack_id) as current_usage
FROM aws_servicequotas_quotas quotas
LEFT JOIN aws_cloudformation_stacks stacks
ON quotas.account_id = stacks.account_id
AND quotas.region = stacks.region
WHERE quota_code = 'L-0485CB21'
GROUP BY quotas.quota_code, quotas.service_code, quotas.quota_name,
quotas.value, quotas.region, quotas.account_id,
quotas.adjustable, quotas.global_quota, quotas.quota_arn;
We can repeat that for other use cases as well such as IAM. In this case, we'll also pull information on IAM account summary and other IAM resources such as IAM Policies. IAM limits are global limits, so we'll look for account usage.
SELECT quotas.*, accounts.policies
FROM aws_servicequotas_quotas quotas
LEFT JOIN aws_iam_accounts accounts
ON quotas.account_id = accounts.account_id
WHERE quota_code = 'L-E95E4862';
We can also look for additional use cases such as API limits. For this query, we'll need to sync
aws_cloudtrail_events
as a resource from AWS. This resource can take time to sync due to the sheer amount of data inNext Steps #
Additional insights can be made with data loaded from CloudQuery. For example,
CloudTrail events
.
can be used to check for API calls. CloudQuery can also be used to monitor service limit increases via Support Cases and customizable dashboards can be built with Grafana. We'd love to hear your use cases with CloudQuery!Summary #
We've now gone through how to use CloudQuery to monitor AWS resource limits and Service Quotas to supplement the information provided by AWS via Service Quotas, Trusted Advisor, and Support in one central place.
Ready to dive deeper? Contact CloudQuery here or join the CloudQuery Community to connect with other users and experts. You can also try out CloudQuery locally with our quick start guide or explore the CloudQuery Platform (currently in beta) for a more scalable solution.
Want help getting started? Join the CloudQuery community to connect with other users and experts, or message our team directly here if you have any questions.
References #
Written by Jason Kao
Jason worked as Head of Security Research and Solutions at CloudQuery and was a Senior Data Engineer prior to taking on that role. He focused on multi-cloud environments and has particular expertise on AWS.