New
Join our webinar! Building a customizable and extensible cloud asset inventory at scale
Tutorials

Migrating from CloudQuery v0 to v1

Herman Schaaf

Herman Schaaf

We are thrilled to announce the release of the first major version of CloudQuery--see our v1 announcement blog post for details! With the new release comes a range of new exciting features, and this page is here to help you migrate an existing CloudQuery installation from v0 to v1.

Changes in V1 #

The announcement blog post lists many of the important improvements, and we won't re-iterate them all here. Most changes are internal and developer-facing, but some do impact existing CloudQuery teams. Those are:

Changes to the Configuration Format #

V1 introduces a new configuration format that is closely related to the old one, but an old configuration will need some massaging to work with the CloudQuery v1 CLI. Mostly because we now support multiple destinations, there are separate configs for source and destination plugins.

Source Plugins #

The new configuration format for source plugins are as follows:
kind: source
spec:
  ## Required. name of the plugin to use
  name: 'aws' # required

  # Required. Must be a specific version starting with v, e.g. v1.2.3
  version: 'v30.1.0'

  ## Optional. Default: "github". Available: "github", "cloudquery", "local", "grpc"
  registry: 'cloudquery'

  ## Plugin path. For official plugins, this should be in the format "cloudquery/<name>", e.g. "cloudquery/aws"
  path: 'cloudquery/aws'

  ## Required. You can use ["*"] to sync all tables or specify specific tables. Please note that syncing all tables can be slow
  ## See all tables: https://hub.cloudquery.io/plugins/source/cloudquery/aws/tables
  tables: ['aws_s3_buckets']

  ## Required. all destinations you want to sync data to.
  destinations: ['postgresql']

  spec:
    # plugin specific configuration.
Check the source spec documentation for general layout, and individual plugin documentation for details on how to configure the plugin-specific spec. Generally these will be the same as in v0, and all the same authentication functionality is still supported.

Destination Plugins #

The new configuration format for destination plugins (e.g. PostgreSQL, BigQuery, Snowflake, and more) is as follows:
kind: destination
spec:
  ## Required. name of the plugin
  name: "postgresql"
  path: "cloudquery/postgresql"
  registry: "cloudquery"
  # Required. Must be a specific version starting with v, e.g. v1.2.3
  version: "v8.7.5"
  ## Optional. Default: "overwrite-delete-stale". Available:  "overwrite-delete-stale", "overwrite", "append". Not all modes are
  ## supported by all plugins, so make sure to check the plugin documentation for more details.
  write_mode: "overwrite-delete-stale" # overwrite-delete-stale, overwrite, append

  spec:
    ## plugin-specific configuration for PostgreSQL:

    ## Required. Connection string to your PostgreSQL instance
    connection_string: "postgresql://postgres:pass@localhost:5432/postgres?sslmode=disable"```
Check the destination spec documentation for general layout, and individual destination plugin documentation for details on how to configure the plugin-specific spec part. Generally these will be the same as in v0, and all the same authentication functionality is still supported.

Changes to the CLI Commands #

Users of CloudQuery v0 would be familiar with the main commands init and fetch. These have changed in v1 and init is longer available (you should write configuration files manually).

Init #

init was a command that generated a starter configuration template, but it is no longer a command in v1 of the CLI. Instead, please refer to our Quickstart guide to see how source and destination plugins should be configured.
The previous init command also generated a full list of tables to fetch. In v1, you can fetch all tables by using a wildcard entry:
tables: ['*']
in the source configuration file. This can also be combined with the skip_tables option to fetch all tables except some subset:
tables: ['*']
skip_tables: ['aws_accessanalyzer_analyzers', 'aws_acm_certificates']

Sync #

cloudquery sync replaces the v0 cloudquery fetch command.
Functionally it is still the same: it loads data from a source to a destination, but sync now supports multiple destinations, while fetch only supported PostgreSQL. With this change also comes a change in expected configuration format, see the next section for more details on this.
cloudquery sync needs to be passed a path to a configuration file or directory containing configuration files. So for example, to sync using all .yml files in a directory named config:
cloudquery sync config/
Or to sync using a single YAML file named config.yml:
cloudquery sync config.yml
In this case config.yml should contain at least one source and one destination configuration, each separated by a line containing three dashes (---). More about this in Files and Directories.
See cloudquery sync --help for more details, or check our online reference.

Files and Directories #

The sync command supports loading configuration from files or directories, and you may choose to combine multiple source- and destination- configs in a single file using --- on its own line to separate different sections. For example:
kind: source
spec:
  name: 'aws'
  version: 'v30.1.0'
  # rest of source spec here
---
kind: destination
spec:
  name: 'postgresql'
  version: 'v8.7.5'
  # rest of destination spec here

Changes to Tables and Schemas #

Finally, during our work for v1, we endeavoured to make the table schemas more consistent, predictable and aligned with their upstream APIs. As such, some breaking changes to the schema were necessary.

Start from a clean Database #

V1 introduces functionality to automatically perform backwards-compatible Postgres migrations when new columns or tables are added. However, this functionality relies on a clean start being made in V1, and if you try to run it against a database with tables from v0, there is a good chance it will fail.
Therefore, it is important that you start from a clean database. This can either mean creating a new database and pointing the v1 configuration there, or dropping all the tables in your v0 database.

Get Help / Ask Questions #

If you run into issues not covered here, or have any questions about migrating or CloudQuery v1, don't hesitate to reach out on our Community. We're a friendly community for developers and would love to help however we can.
Ready to get started with CloudQuery? You can try out CloudQuery locally with our quick start guide or explore the CloudQuery Platform (currently in beta) for a more scalable solution.
Got feedback or suggestions? Join the CloudQuery community to connect with other users and experts, or message our team directly here if you have any questions.
Herman Schaaf

Written by Herman Schaaf

Herman is the Director of Engineering at CloudQuery and an Apache Arrow contributor. A polyglot with a preference for Go and Python, he has spoken at QCon London and Data Council New York.

Turn cloud chaos into clarity

Find out how CloudQuery can help you get clarity from a chaotic cloud environment with a personalized conversation and demo.

Join our mailing list

Subscribe to our newsletter to make sure you don't miss any updates.

Legal

© 2024 CloudQuery, Inc. All rights reserved.

We use tracking cookies to understand how you use the product and help us improve it. Please accept cookies to help us improve. You can always opt out later via the link in the footer.