Tutorials
Migrating from CloudQuery v0 to v1
We are thrilled to announce the release of the first major version of CloudQuery--see our v1 announcement blog post for details! With the new release comes a range of new exciting features, and this page is here to help you migrate an existing CloudQuery installation from v0 to v1.
Changes in V1 #
The announcement blog post lists many of the important improvements, and we won't re-iterate them all here. Most changes are internal and developer-facing, but some do impact existing CloudQuery teams. Those are:
Changes to the Configuration Format #
V1 introduces a new configuration format that is closely related to the old one, but an old configuration will need some massaging to work with the CloudQuery v1 CLI. Mostly because we now support multiple destinations, there are separate configs for source and destination plugins.
Source Plugins #
The new configuration format for source plugins are as follows:
kind: source
spec:
## Required. name of the plugin to use
name: 'aws' # required
# Required. Must be a specific version starting with v, e.g. v1.2.3
version: 'v30.1.0'
## Optional. Default: "github". Available: "github", "cloudquery", "local", "grpc"
registry: 'cloudquery'
## Plugin path. For official plugins, this should be in the format "cloudquery/<name>", e.g. "cloudquery/aws"
path: 'cloudquery/aws'
## Required. You can use ["*"] to sync all tables or specify specific tables. Please note that syncing all tables can be slow
## See all tables: https://hub.cloudquery.io/plugins/source/cloudquery/aws/tables
tables: ['aws_s3_buckets']
## Required. all destinations you want to sync data to.
destinations: ['postgresql']
spec:
# plugin specific configuration.
Check the source spec documentation for general layout, and individual plugin documentation for details on how to configure the plugin-specific spec. Generally these will be the same as in v0, and all the same authentication functionality is still supported.
Destination Plugins #
The new configuration format for destination plugins (e.g. PostgreSQL, BigQuery, Snowflake, and more) is as follows:
kind: destination
spec:
## Required. name of the plugin
name: "postgresql"
path: "cloudquery/postgresql"
registry: "cloudquery"
# Required. Must be a specific version starting with v, e.g. v1.2.3
version: "v8.7.5"
## Optional. Default: "overwrite-delete-stale". Available: "overwrite-delete-stale", "overwrite", "append". Not all modes are
## supported by all plugins, so make sure to check the plugin documentation for more details.
write_mode: "overwrite-delete-stale" # overwrite-delete-stale, overwrite, append
spec:
## plugin-specific configuration for PostgreSQL:
## Required. Connection string to your PostgreSQL instance
connection_string: "postgresql://postgres:pass@localhost:5432/postgres?sslmode=disable"```
Check the destination spec documentation for general layout, and individual destination plugin documentation for details on how to configure the plugin-specific spec part. Generally these will be the same as in v0, and all the same authentication functionality is still supported.
Changes to the CLI Commands #
Users of CloudQuery v0 would be familiar with the main commands
init
and fetch
. These have changed in v1 and init
is longer available (you should write configuration files manually).Init #
init
was a command that generated a starter configuration template, but it is no longer a command in v1 of the CLI. Instead, please refer to our Quickstart guide to see how source and destination plugins should be configured.The previous
init
command also generated a full list of tables to fetch. In v1, you can fetch all tables by using a wildcard entry:tables: ['*']
in the source configuration file. This can also be combined with the
skip_tables
option to fetch all tables except some subset:tables: ['*']
skip_tables: ['aws_accessanalyzer_analyzers', 'aws_acm_certificates']
Sync #
cloudquery sync
replaces the v0 cloudquery fetch
command.Functionally it is still the same: it loads data from a source to a destination, but
sync
now supports multiple destinations, while fetch
only supported PostgreSQL. With this change also comes a change in expected configuration format, see the next section for more details on this.cloudquery sync
needs to be passed a path to a configuration file or directory containing configuration files. So for example, to sync using all .yml
files in a directory named config
:cloudquery sync config/
Or to sync using a single YAML file named
config.yml
:cloudquery sync config.yml
In this case
config.yml
should contain at least one source and one destination configuration, each separated by a line containing three dashes (---
). More about this in Files and Directories.See
cloudquery sync --help
for more details, or check our online reference.Files and Directories #
The
sync
command supports loading configuration from files or directories, and you may choose to combine multiple source- and destination- configs in a single file using ---
on its own line to separate different sections. For example:kind: source
spec:
name: 'aws'
version: 'v30.1.0'
# rest of source spec here
---
kind: destination
spec:
name: 'postgresql'
version: 'v8.7.5'
# rest of destination spec here
Changes to Tables and Schemas #
Finally, during our work for v1, we endeavoured to make the table schemas more consistent, predictable and aligned with their upstream APIs. As such, some breaking changes to the schema were necessary.
Start from a clean Database #
V1 introduces functionality to automatically perform backwards-compatible Postgres migrations when new columns or tables are added. However, this functionality relies on a clean start being made in V1, and if you try to run it against a database with tables from v0, there is a good chance it will fail.
Therefore, it is important that you start from a clean database. This can either mean creating a new database and pointing the v1 configuration there, or dropping all the tables in your v0 database.
Get Help / Ask Questions #
If you run into issues not covered here, or have any questions about migrating or CloudQuery v1, don't hesitate to reach out on our Community. We're a friendly community for developers and would love to help however we can.
Ready to get started with CloudQuery? You can try out CloudQuery locally with our quick start guide or explore the CloudQuery Platform (currently in beta) for a more scalable solution.
Got feedback or suggestions? Join the CloudQuery community to connect with other users and experts, or message our team directly here if you have any questions.
Written by Herman Schaaf
Herman is the Director of Engineering at CloudQuery and an Apache Arrow contributor. A polyglot with a preference for Go and Python, he has spoken at QCon London and Data Council New York.