New
Join our webinar! Building a customizable and extensible cloud asset inventory at scale
Tutorials

Add observability to your CloudQuery sync runs with Datadog

Erez Rokah

Erez Rokah

In this guide we will discuss how to leverage Datadog to add observability to CloudQuery sync jobs. A common use case for CloudQuery is to run it as a scheduled job to sync cloud resources to data warehouse, or even multiple jobs each with a different configuration. In order to get insights into runs of sync jobs, we can use Datadog to monitor the sync jobs and get alerts when something goes wrong.

Prerequisites #

  • A Datadog account. See here for getting started
  • The Datadog agent installed in the machine(s) running the CloudQuery sync jobs. See here for more information
  • Log collection enabled in the Datadog agent configuration. See here for more information

Adding a Custom Log Source for CloudQuery Logs #

First we recommended passing the --log-format json flag to the CloudQuery sync command. This ensures that the logs are in a structured format that Datadog can parse. Datadog can collect logs in non JSON format as well, but using JSON format ensure that we can query the logs.
Now, let's create the custom log source in Datadog:
  1. Locate the Datadog conf.d directory. You can find the location of the directory based on our OS in the Datadog documentation here.
  2. Under the conf.d directory create a directory named cloudquery.d
  3. Under the cloudquery.d directory create a file named conf.yaml with the following content:
logs:
  - type: "file"
    path: "<full path to the CloudQuery log file>" # Example path: /var/log/cloudquery/cloudquery.log
    service: "CloudQuery"
    source: "CloudQuery"
  1. Restart the Datadog agent for changes to take effect.
You can find more information on setting up a custom log source in Datadog here

Using Datadog to query CloudQuery logs #

Once you run a CloudQuery sync job, you should see the logs in the Datadog logs explorer:
If you don't see any logs you might need to chose a different time range in the Datadog logs explorer:
By using JSON format we can immediately query the logs in Datadog.
For example the @errors:>0 @table:* query filters all tables that had errors during the sync job:
And the @resources:>10000 @table:* query filters all tables that had more than 10,000 resources during the sync job:

Summary and Next Steps #

In this guide we created a custom log source in Datadog to collect CloudQuery logs and query them.
Datadog is a powerful tool for monitoring and observability and we only scratched the surface of what you can do with it. We recommend reading the Datadog documentation to learn more about its capabilities, for example creating monitors and dashboards.
Ready to get started with CloudQuery? You can try out CloudQuery locally with our quick start guide or explore the CloudQuery Platform (currently in beta) for a more scalable solution.
Want help getting started? Join the CloudQuery community to connect with other users and experts, or message our team directly here if you have any questions.
Erez Rokah

Written by Erez Rokah

I'm a security oriented open source maintainer. I joined the CloudQuery team in April 2022 to focus on building a developer first, open source, high performance data integration platform for security and infrastructure teams.

Turn cloud chaos into clarity

Find out how CloudQuery can help you get clarity from a chaotic cloud environment with a personalized conversation and demo.

Join our mailing list

Subscribe to our newsletter to make sure you don't miss any updates.

Legal

© 2024 CloudQuery, Inc. All rights reserved.

We use tracking cookies to understand how you use the product and help us improve it. Please accept cookies to help us improve. You can always opt out later via the link in the footer.