Tutorials
Search Through Your Cloud Infrastructure with CloudQuery and Elasticsearch
Introduction #
Have you ever wished you could enter the ID of an EC2 instance into the search bar and get a list of all the related resources? Or search by tag across all your accounts and regions? Or search for resources by keyword? With the new CloudQuery Elasticsearch destination, you can do all this, and more.
Using the Elasticsearch destination, you can search through your cloud infrastructure, create visualizations and dashboards, and generally explore your infrastructure data in a way that wasn't possible before. In this article we will show you how to get started and a few examples of the things you can do.
Why Elasticsearch? #
Elasticsearch is a popular open source search engine built on top of Lucene. It is a distributed, RESTful search and analytics engine that is capable of storing and searching terabytes of data. As datasets grow over time, managing a SQL database can become more and more difficult, but Elasticsearch offers a great alternative here. It is schemaless and has built-in support for scaling, sharding and replication. Its full text search capabilities also make it a great fit for searching through unstructured data, especially JSON columns.
Getting Started #
First, you will need to have CloudQuery installed. If you haven't already, you can do so by following the instructions in the Quickstart guide.
You will also need a running Elasticsearch instance. If you don't have one, you can either create one locally using Docker, or use the Elastic Cloud service, which offers a free trial. This
docker-compose
file will get you started with local Elasticsearch and Kibana services (download the file, then run docker compose up
). Once it's up, you should be able to visit localhost:5601
and see the Kibana interface.Configuring the Elasticsearch Destination #
Now that you have CloudQuery and Elasticsearch running, you can configure the Elasticsearch destination. To do this, follow the instructions in the Elasticsearch destination plugin docs to create a configuration file called
elasticsearch.yml
.Assuming you want to browse AWS infrastructure data, you also need to create a configuration file for the AWS provider. Again, follow the instructions in the AWS Source guide to create a configuration file called
aws.yml
.Finally, you can run the CloudQuery command to sync your AWS infrastructure data to Elasticsearch:
cloudquery sync aws.yml elasticsearch.yml
Setting up the Data Views in Kibana #
When the sync completes, head over to the "Stack Management" section of Kibana and click on "Data Views" under "Kibana". Click "Create data view" and enter
aws*
as the index pattern. Select _cq_sync_time
as the time field (you can change this later, if you want). Click "Save data view to Kibana" to finish.Searching Through Your Cloud Infrastructure #
Now that you have your data in Elasticsearch, you can start searching through it. To do this, head over to the "Discover" section of Kibana and enter a search query. For example, we can search by IP address:
Or by tags:
Or we can use the native JSON object support to search through nested JSON objects. This query will find EC2 instances that have their monitoring state set to "disabled":
We can also build visualizations and dashboards using the data in Elasticsearch. For example, we can create a pie chart that shows the distribution of EC2 instance types:
(Note: To create this particular visualization with the current version of the AWS and Elasticsearch plugins, we had to change the type of the
instance_type
field from text
to keyword
in the "Index Patterns" section of Kibana. This can be done by first doing a migration, which creates the index templates, editing the template, and then running a new sync.)Building a Historical View of Your Infrastructure #
One of the great things about Elasticsearch is that it is designed to be used with time series data, and therefore a final thing we wanted to note in this post is the ability to view historical snapshots of your infrastructure from Elasticsearch. The CloudQuery Elasticsearch destination currently supports three write modes:
overwrite
, overwrite-delete-stale
(the default) and append
. You can read more about these in the plugin documentation, but in short, when using append
mode, CloudQuery will never delete data, so you can build a historical view of your infrastructure over time. For example, by running a sync in append
mode every day, you can build a visualization that shows the number of EC2 instances over time, or more generally track how (or whether!) your organization's security posture is improving over time.Conclusion #
In this article we showed you how to get started with the new CloudQuery Elasticsearch destination, now available in Preview. We also showed you a few examples of the things you can do with the data in Elasticsearch in combination with the AWS source plugin, including searching through your cloud infrastructure, building visualizations and dashboards, and building a historical view of your infrastructure. But don't think that it's limited to AWS! The Elasticsearch destination works with all CloudQuery source plugins, so you can search through your Azure, GCP, Kubernetes, or even Marketing and FinOps data as well. We hope you find this new destination useful, and we look forward to seeing what you build with it!
Ready to get started with CloudQuery? You can try out CloudQuery locally with our quick start guide or explore the CloudQuery Platform (currently in beta) for a more scalable solution.
Want help getting started? Join the CloudQuery community to connect with other users and experts, or message our team directly here if you have any questions.
Written by Herman Schaaf
Herman is the Director of Engineering at CloudQuery and an Apache Arrow contributor. A polyglot with a preference for Go and Python, he has spoken at QCon London and Data Council New York.