How to Build an Open Source CSPM with CloudQuery, PostgreSQL, and Grafana

Security

Tutorials

How to Build an Open Source CSPM with CloudQuery, PostgreSQL, and Grafana

•

In this guide, we will walk you through how to set up CloudQuery to build your customizable compliance, Cloud Security Posture Management (CSPM) dashboard with PostgreSQL and Grafana.

CSPMs are probably the biggest offenders of yet another dashboard syndrome, and at CloudQuery we believe it’s time to unbundle those, applying the best practices in data engineering and the modern data stack to cloud security and compliance.

General Architecture of the CSPM #

ETL (Extract-Transform-Load) Ingestion Layer: This is the part where data is collected, processed, and loaded into the system. CloudQuery is the tool used here to gather data from various cloud sources and prepare it for storage.
Datastore: This is where the collected data is stored. PostgreSQL, a type of database, is used to keep all the cloud data organized and accessible.
Transformations (Policies): After the data is stored, it often needs to be processed or transformed to be useful. This is done using dbt Transformations, which applies rules and policies to the data to make it ready for analysis.
Data Visualization and Exploration Platform: Finally, to make sense of the data, it’s visualized using Grafana. Grafana helps create cloud dashboards and reports so users can easily explore and understand the cloud data.

What you will get from this setup #

Raw SQL access to all your cloud asset inventory.
Multi-Cloud Asset Inventory: Ingest configuration from all your clouds to a single Datastore with a unified structure.
Avoid 'yet another dashboard' fatigue: Reuse your existing BI/Visualization stack (Grafana in this example) to build an open-source CSPM.

Building the CSPM step-by-step #

Step 1: Install or deploy CloudQuery #

To get started with CloudQuery visit our download page.

If you are already familiar with CloudQuery, you can deploy it with Kubernetes, on AWS ECS using Fargate or with Google Cloud Run. For more deployment guides, see our Deployment Overview.

Step 2: Install Grafana #

Grafana is a well-known open-source observability and visualization tool. It is open source, so there are a number of ways to deploy it:

Self-hosted (local, docker, k8s): Official guide.
SaaS/managed: Grafana.com
AWS Managed Grafana: https://aws.amazon.com/grafana/

Step 3: Install dbt #

CloudQuery policies and rules are implemented using dbt, a popular open-source tool for data transformation.

After the installation, you will need to provide dbt with a profile that defines how to connect to relevant databases. By default, it searches for a profiles.yml file in the local directory and falls back to ~/.dbt/. Read more about profiles in the dbt documentation.

Your profiles.yml file should look like this:

config:
  send_anonymous_usage_stats: False
  use_colors: True

aws_compliance: # this should match the profile name in your dbt_project.yml, see step 5.
  target: postgres
  outputs:
    postgres:
      type: postgres
      host: 'your postgres host'
      user: 'postgres user name'
      pass: 'postgres password'
      port: 5432
      dbname: 'database name'
      schema: public
      threads: 4

Step 4: Configure and run CloudQuery sync #

You will need to configure CloudQuery to sync your cloud assets to your PostgreSQL instance. For AWS assets, you will need the AWS Plugin. Policies are also available for GCP, Azure, and Kubernetes.

To fully define the configuration, you will need to specify what tables to sync. This depends on the policy you want to use. For example, to use AWS Compliance (free), you will need to sync the following tables:

- aws_cloudwatch_alarms
- aws_cloudwatchlogs_metric_filters
- aws_ec2_network_acls
- aws_ec2_security_groups
- aws_sns_subscriptions
- aws_iam_credential_reports
- aws_iam_password_policies
- aws_iam_user_access_keys
- aws_iam_users
- aws_autoscaling_groups
- aws_cloudtrail_trail_event_selectors
- aws_cloudtrail_trails
- aws_codebuild_projects
- aws_config_configuration_recorders
- aws_apigateway_rest_api_stages
- aws_apigateway_rest_apis
- aws_apigatewayv2_api_routes
- aws_apigatewayv2_api_stages
- aws_apigatewayv2_apis
- aws_cloudfront_distributions
- aws_efs_access_points
- aws_elasticbeanstalk_environments
- aws_elbv1_load_balancers
- aws_elbv2_load_balancer_attributes
- aws_elbv2_load_balancers
- aws_iam_accounts
- aws_rds_clusters
- aws_s3_accounts

To see what other transformations are available and what tables are required, visit the individual Transformations documentation pages.

Step 5: Run policies (CSPM - Cloud Security Posture Management) #

Download the AWS Compliance (free) and extract the package.

Navigate to the extracted directory where the dbt project file (dbt_project.yml) resides. Check that the profile: aws_compliance matches the profile name in your profiles.yml file that you configured in step 3.

Before executing the dbt run command, it might be useful to check for any potential issues:

dbt compile

If everything compiles without errors, you can then execute:

dbt run

This command will run all the dbt models and create views in your destination database as defined in the models.

Now you can query the views directly and export in various formats such as CSV or HTML, all with standard psql, and of course visualize them in your favorite BI tool.

You can repeat this for other cloud providers by adding additional plugins to your syncs and running the relevant transformations.

Step 6: Visualize in Grafana #

For AWS Transformations, we offer free Grafana dashboards that you can use as a starting point.

Download AWS Compliance Dashboard and extract the zip file. Find the dashboard.json file in the extracted directory (in aws_compliance/grafana/postgres) and import it into your Grafana instance.

At the top of the dashboard, select the data source to be the PostgreSQL database with data synced by CloudQuery.

Now you should see a dashboard similar to this:

See our Hub for additional dashboards.

Summary #

tl;dr: CloudQuery collects and prepares the data from your cloud provider, PostgreSQL stores it, dbt transforms it, and Grafana visualizes it.

That’s it! Now you have a fully functional setup (or any other SPM) with access to raw data available and stored in your PostgreSQL. Don’t forget to run the syncs regularly to get fresh data. You only need to run the transformations when you want to update to a newer version.

If you’re interested in checking out a better way to build, try CloudQuery. You can download and use CloudQuery and follow along with our quick start guide, or explore the CloudQuery Platform for a more scalable solution.

Got feedback or suggestions? Join the CloudQuery community to connect with other users and experts, or message our team directly here if you have any questions.

CloudQuery

Test CloudQuery's capabilities with a demo

How to Build an Open Source CSPM with CloudQuery, PostgreSQL, and Grafana

General Architecture of the CSPM #

What you will get from this setup #

Building the CSPM step-by-step #

Step 1: Install or deploy CloudQuery #

Step 2: Install Grafana #

Step 3: Install dbt #

Step 4: Configure and run CloudQuery sync #

Step 5: Run policies (CSPM - Cloud Security Posture Management) #

Step 6: Visualize in Grafana #

Summary #

Related posts

Running Embedded ELT with CloudQuery and Snowpark Container Service