CloudQuery vs Steampipe: A Comprehensive Comparison #
The landscape of data integration and movements constantly shifting. Choosing the right ELT or ETL tool depends on the requirements, needs, and resources available. This blog compares the pros and cons of CloudQuery vs Steampipe.
What is CloudQuery? #
CloudQuery is an open-source, cross-language, high-performance ELT (Extract-Load-Transform) framework powered by Apache Arrow. It is extremely fast and easy to run both locally and in the cloud, it has a CLI-first design, is shipped as a single binary, and doesn’t need any additional services or UI to run.
What is Steampipe? #
Funded by Turbot, Steampipe is an open-source ETL (Extract-Transform-Load) framework for auditing cloud and network infrastructure. Steampipe’s major differentiating factor is real-time queries - where the SQL request is translated into live API calls. This has the benefit of always being up-to-date with the latest state but at the cost of significantly slower response times, and a massive increase in API calls. Steampipe is provided as both a single binary and via Turbot’s Pipes (essentially Steampipe-as-a-Service).
Comparison Overview #
| CloudQuery | Steampipe |
---|
Architecture | Pluggable Architecture powered by gRPC and Apache Arrow. CLI-first and shipped as a single binary that can be run anywhere | Pluggable Architecture with a core engine that translates APIs to tables with support for Postgres Foreign Data Wrappers |
Custom Source or Destination Development | Any Language (Golang, Python, Javascript, Java). More coming | Golang |
Sources / Connectors | 97 (focused on cloud infrastructure connectors) | 140 (focused on cloud infrastructure connectors) |
Destinations | All data warehouses, lakes, and databases | Built-In Postgres instance or External Postgres instance (Postgres FDW compatible plugins only) |
Connector Quality | CloudQuery’s internal developers maintain all official connectors to ensure consistent quality | The majority of Steampipe’s plugins are maintained by Turbot’s internal developer teams, with some community plugins |
Performance/Coverage | Focused on performance | Focused on real-time data |
Orchestrator Integration | CloudQuery can run directly/embedded in Airflow, Dagster, Step Functions, Prefect, or any other orchestrator due to its light-weight, stand-alone cross-platform design | Steampipe can run directly/embedded in various orchestrators and CI/CD Platforms, but it’s not really designed to be utilized in this way. |
License | • Framework is open source • Plugins are closed-source commercial | • Framework is open source • Plugins are open source |
Pricing | • Volume-based pricing, varies depending on the connector • Flat fee yearly quotes are available based on average usage to protect against spikes • Free quota is available for all plugins | • Using the CLI version is free • Cloud Offering is priced on Compute time, Storage capacity, and number of Users |
Architecture and Deployment #
Both CloudQuery and Steampipe ship as a single binary with pluggable components. However, their key differences are in how they process data and queries.
CloudQuery extracts the data from the APIs and loads it into an instance of virtually any database/lake/warehouse you choose, where you can then query the data in the native language for that data store.
Where as Steampipe directly translates PostgreSQL queries into API calls, and the result is computed in real-time with the choice to optionally store the results.
Data sources and destination connectors #
Sources and destinations are the bread and butter of data integration solutions. With key differences, pros and cons, for each platform.
CloudQuery and Steampipe have a similar core sets of source connectors, with both focusing on cloud infrastructure and collecting usage, security, and configuration data.
With the Golang SDK for both frameworks being fairly comparable in developer experience, developing new connectors is as straightforward as possible. CloudQuery edges ahead here with its support for connectors developed in any language that supports Apache Arrow and gRPC.
Unlike CloudQuery, Steampipe has limited support for external databases and relies heavily on Postgres integration for storing results.
While CloudQuery has a competitive number of production-ready connectors (source plugins), it prioritizes performance first. This pays off in performance comparisons, where our users and customers have seen more than 50x speed improvements when switching to CloudQuery.
More on this coming soon!
Pricing and Costs #
Using the Steampipe CLI is completely free with most plugins, whereas CloudQuery offers a free quota and charges if the number of syncs exceeds that quota.
The cloud version of Steampipe (aka Pipes), charges for the number of Users, Compute time (per second), and Storage capacity used. Whereas CloudQuery’s Cloud offering is still based on the number of rows with the same free quotas as for the CLI, with some additional fees based on data Egress, vCPUs used (per hour), and vRAM used (per hour), with no limits on the number of users.
CloudQuery is a sync-based solution, so as soon as the data is in your database, you won't be charged more. However, as Steampipe is a real-time solution, you're paying for it every time you query your data.
Conclusion #
We’re obviously biased, but we think CloudQuery is the clear winner in flexibility, performance, and pricing.
Steampipe does have its advantages (real-time queries being the foremost - though CloudQuery’s event-based syncs definitely challenge that), but the increased load on Source APIs and the significantly worse performance make it difficult to recommend for production use cases beyond simple inspection and troubleshooting tasks.
Ready to get started with CloudQuery? You can try out CloudQuery locally with our
quick start guide or explore
the CloudQuery Platform (currently in beta) for a more scalable solution.
Want help getting started? Join the
CloudQuery community to connect with other users and experts, or message our team directly
here if you have any questions.