CloudQuery News
Announcing the CloudQuery BigQuery Source Integration
Introducing the new CloudQuery BigQuery source plugin!
BigQuery is a popular, fully managed, serverless data warehouse designed for large datasets. We have now added a BigQuery Source Integration that enables syncing tables and views from a dataset to any supported CloudQuery destination.
Data teams often use BigQuery for its scalability, ease of use, and integration with other Google Cloud services. It is commonly used alongside our GCP Source Integration. These teams utilize BigQuery for various data analytics tasks such as ad-hoc queries, data exploration, and large-scale data processing. However, sometimes you need to do an advanced analysis, run transformations, or consolidate the data in a different database. The new source plugin enables just that.
On top of syncing the individual tables and views from the source dataset, the new plugin supports queries at the source, thus enabling on-the-fly transformation of the data.
Example Configuration #
Set up CloudQuery locally, or use the CloudQuery Platform.
To sync a whole dataset to a PostgreSQL database, use the default configuration. Specify your Project ID, Dataset ID (name), and the connection string to the database. For other options and details on how to authenticate with BigQuery, see BigQuery Source Integration Documentation.
kind: source
spec:
name: bigquery
path: cloudquery/bigquery
registry: cloudquery
version: "v1.0.0"
tables: ["*"]
destinations: ["postgresql"]
spec:
project_id: ${PROJECT_ID}
dataset_id: ${DATASET_ID}
---
kind: destination
spec:
name: "postgresql"
path: "cloudquery/postgresql"
registry: "cloudquery"
version: "v7.6.0"
spec:
connection_string: ${POSTGRESQL_CONNECTION_STRING}
Specifying a query #
In this example, assume a dataset
usage
with table usage_events
having columns customer_id
, timestamp
, and miles
. The individual entries represent a distance driven by a customer during a single driving session started at the timestamp.We would like to get only the summary of the distance driven last month grouped by the customer and sync it to a PostgreSQL database. Here's what the configuration looks like with the SQL query included:
kind: source
spec:
name: bigquery
path: cloudquery/bigquery
registry: cloudquery
version: "v1.0.0"
tables: ["*"]
destinations: ["postgresql"]
spec:
project_id: ${PROJECT_ID}
dataset_id: ${DATASET_ID}
queries:
- name: monthly_usage
- query: >
select customer_id, sum(miles) as usage from billing.usage_events
where timestamp > current_timestamp - interval 30 day
group by customer_id
---
kind: destination
spec:
name: "postgresql"
path: "cloudquery/postgresql"
registry: "cloudquery"
version: "v7.6.0"
spec:
connection_string: ${POSTGRESQL_CONNECTION_STRING}
This configuration will not sync the entire dataset. Instead, it will create only the
monthly_usage
table and populate it with the results of the query.You can include as many queries as you want in one sync configuration.
Find more resources in Hub #
With this plugin, you can sync data from BigQuery to a destination of your choice. However, you can get data from more sources than just BigQuery. Check out our other database source integrations on the CloudQuery Hub.
If you'd like to start using the BigQuery source integration, download CloudQuery today. You can also discuss use cases and get feedback from other CloudQuery users in the CloudQuery Community.
Ready to dive deeper? Contact CloudQuery here or join the CloudQuery Community to connect with other users and experts.
You can also try out CloudQuery locally with our quick start guide or explore the CloudQuery Platform (currently in beta) for a more scalable solution.
Written by Michal Brutvan
Michal is CloudQuery's senior product manager and has responsibility for new features and CloudQuery's product roadmap. He has had a wealth of product ownership roles and prior to that, worked as a software engineer.