Tutorials
Data-Driven Decision Making - Using CloudQuery to Export and Analyze HubSpot Data
At CloudQuery, we’re all about giving you granular access to your data in an environment or database you are comfortable using. With CloudQuery, you can export your HubSpot data to your preferred database. Using CloudQuery to extract and load data from HubSpot gives you complete control over how you use your CRM data. Access to your raw HubSpot CRM data in your database means it can be incorporated into your already existing analytics suite. Once you have your HubSpot data in your database, you can analyze it, combine it with other data, and transform it into actionable data in your database schema.
Once your data has been loaded into your database, you will use the following tables created by CloudQuery's data movement tool:
hubspot_crm_companies
, hubspot_crm_contacts
, and hubspot_crm_deals
. These tables represent organizations, individual contacts, and sales deals in your HubSpot CRM, respectively. Each of these tables has JSON fields titled associations
and properties
, which make it difficult to easily join associated tables and gain descriptive information about these entities in your HubSpot data without SQL expertise.How to load your HubSpot data to your database #
- Download CloudQuery here
- Configure your HubSpot source and your database destination
- The following queries can be used to create views or tables that allow you to flatten these JSON fields into a columnar structure.
Parsing HubSpot associations data with PostgreSQL #
Now that you have exported your HubSpot data to your database let’s create a reference dataset of associations using the
hubspot_crm_companies
table:select
_cq_sync_time,
_cq_source_name,
_cq_id,
_cq_parent_id,
id as company_id,
contact ->> 'id' as contact_id,
contact ->> 'type' as association_type
from
hubspot_crm_companies,
json_array_elements(associations -> 'contacts' -> 'results') as contact
Because this table focuses on HubSpot companies, you can alias the
id
column as company_id
to avoid confusion. You can then leverage PostgreSQL JSON functions to isolate contacts that are associated with companies and parse the ensuing id
as contact_id
, as well as the type of association from the original JSON column.Next, let’s create an associations reference dataset from the
hubspot_crm_contacts
table:select
_cq_sync_time,
_cq_source_name,
_cq_id,
_cq_parent_id,
id as contact_id,
company ->> 'id' as company_id,
company ->> 'type' as association_type
from
hubspot_crm_contacts,
json_array_elements(associations -> 'companies' -> 'results') as company
Just as above in the companies example you can alias the
id
from the hubspot_crm_contacts
table as contact_id
and the id
from the associations
JSON column as company_id
to avoid confusion.Finally, let’s create an associations dataset for the
hubspot_crm_deals
table:select
_cq_sync_time,
_cq_source_name,
_cq_id,
_cq_parent_id,
id as deal_id,
company ->> 'id' as company_id,
company ->> 'type' as association_type
from
hubspot_crm_deals,
json_array_elements(associations -> 'companies' -> 'results') as company
The same logic applies to this table. As deals are tied to companies in HubSpot you will want to separate the
id
field from the hubspot_crm_deals
table and the id
value from the associations
JSON field by aliasing them as deal_id
and company_id
, respectively.Parsing HubSpot properties data with PostgreSQL #
The other JSON column provided by HubSpot is the
properties
column. In the hubspot_crm_companies
table, this column contains descriptive information about companies, such as their industry, country, or any custom properties you may want to add to your HubSpot CRM.Here’s an example you can use to flatten company properties into columns using PostgreSQL:
select
_cq_sync_time,
_cq_source_name,
_cq_id,
_cq_parent_id,
id,
properties ->> 'country' as country,
(properties ->> 'createdate')::timestamp as created_date,
properties ->> 'description' as description,
properties ->> 'domain' as domain,
(properties ->> 'hs_lastmodifieddate')::timestamp as hs_last_modified_date,
properties ->> 'hs_object_id' as hs_object_id,
properties ->> 'industry' as industry
from
hubspot_crm_companies
Next is an example of parsing individual HubSpot contact properties:
select
_cq_sync_time,
_cq_source_name,
_cq_id,
_cq_parent_id,
id,
(properties ->> 'createdate')::timestamp as created_date,
properties ->> 'email' as email,
properties ->> 'firstname' as first_name,
properties ->> 'lastname' as last_name,
properties ->> 'hs_object_id' as hs_object_id,
(properties ->> 'lastmodifieddate')::timestamp as last_modified_date
from
hubspot_crm_contacts
Finally, here is an example of parsing HubSpot deal properties:
select
_cq_sync_time,
_cq_source_name,
_cq_id,
_cq_parent_id,
id,
properties ->> 'amount' as amount,
(properties ->> 'closedate')::timestamp as close_date,
(properties ->> 'createdate')::timestamp as created_date,
properties ->> 'dealname' as deal_name,
properties ->> 'dealstage' as deal_stage,
(properties ->> 'lastmodifieddate')::timestamp as last_modified_date,
properties ->> 'hs_object_id' as hs_object_id,
properties ->> 'pipeline' as pipeline
from
hubspot_crm_deals
Parsing HubSpot associations data with BigQuery #
Here, you can repeat the above steps, but the queries will use BigQuery SQL syntax. These queries will leverage BigQuery JSON functions. Note: BigQuery is ‘append-only’, preferred by design, so you can leverage the
_cq_sync_time
field to select only the most recent synced data. As with the above PostgreSQL examples, you can use the following query to create the same company associations dataset using BigQuery syntax:select
_cq_sync_time,
_cq_source_name,
_cq_id,
_cq_parent_id,
id as company_id,
json_value(contact, '$.id') as contact_id,
json_value(contact, '$.type') as association_type
from
your_bq_dataset.hubspot_crm_companies,
unnest(json_extract_array(associations, '$.contacts.results')) as contact
where
_cq_sync_time = (select max(_cq_sync_time) from your_bq_dataset.hubspot_crm_companies)
Next is the associations dataset using the HubSpot contacts table:
select
_cq_sync_time,
_cq_source_name,
_cq_id,
_cq_parent_id,
id as contact_id,
json_value(company, '$.id') as company_id,
json_value(company, '$.type') as association_type
from
your_bq_dataset.hubspot_crm_contacts,
unnest(json_extract_array(associations, '$.companies.results')) as company
where
_cq_sync_time = (select max(_cq_sync_time) from your_bq_dataset.hubspot_crm_contacts)
Finally, the associations' dataset using the HubSpot deals table:
select
_cq_sync_time,
_cq_source_name,
_cq_id,
_cq_parent_id,
id as deal_id,
json_value(company, '$.id') as company_id,
json_value(company, '$.type') as association_type
from
your_bq_dataset.hubspot_crm_deals,
unnest(json_extract_array(associations, '$.companies.results')) as company
where
_cq_sync_time = (select max(_cq_sync_time) from your_bq_dataset.hubspot_crm_deals)
Parsing HubSpot properties data with BigQuery #
The same criteria applies to the below queries, you can use BigQuery’s JSON functions to parse HubSpot data to columns. You can use the below query to create columns from the HubSpot companies
properties
column for a view or new table:select
_cq_sync_time,
_cq_source_name,
_cq_id,
_cq_parent_id,
id,
json_value(properties, '$.country') as country,
cast(json_value(properties, '$.createdate') as timestamp) as created_date,
json_value(properties, '$.description') as description,
json_value(properties, '$.domain') as domain,
cast(json_value(properties, '$.hs_lastmodifieddate') as timestamp) as hs_last_modified_date,
json_value(properties, '$.hs_object_id') as hs_object_id,
json_value(properties, '$.industry') as industry
from
your_bq_dataset.hubspot_crm_companies
where
_cq_sync_time = (select max(_cq_sync_time) from your_bq_dataset.hubspot_crm_companies)
Next is an example you can use to flatten HubSpot contact properties to columns:
select
_cq_sync_time,
_cq_source_name,
_cq_id,
_cq_parent_id,
id,
cast(json_value(properties, '$.createdate') as timestamp) as created_date,
json_value(properties, '$.email') as email,
json_value(properties, '$.firstname') as first_name,
json_value(properties, '$.lastname') as last_name,
json_value(properties, '$.hs_object_id') as hs_object_id,
cast(json_value(properties, '$.lastmodifieddate') as timestamp) as last_modified_date
from
your_bq_dataset.hubspot_crm_contacts
where
_cq_sync_time = (select max(_cq_sync_time) from your_bq_dataset.hubspot_crm_contacts)
And last is an example of parsed HubSpot deals properties:
select
_cq_sync_time,
_cq_source_name,
_cq_id,
_cq_parent_id,
id,
json_value(properties, '$.amount') as amount,
cast(json_value(properties, '$.closedate') as timestamp) as close_date,
cast(json_value(properties, '$.createdate') as timestamp) as created_date,
json_value(properties, '$.dealname') as deal_name,
json_value(properties, '$.dealstage') as deal_stage,
cast(json_value(properties, '$.hs_lastmodifieddate') as timestamp) as last_modified_date,
json_value(properties, '$.hs_object_id') as hs_object_id,
json_value(properties, '$.pipeline') as pipeline
from
your_bq_dataset.hubspot_crm_deals
where
_cq_sync_time = (select max(_cq_sync_time) from your_bq_dataset.hubspot_crm_deals)
If you need to gain insights from or visualize your HubSpot CRM data quickly, CloudQuery lets you fully control the process. Download CloudQuery to gain immediate insights into your customer relations and sales pipelines. If you have any questions or want to connect with our engineering team, contact us or join our Community.
Written by Kevin Rheinheimer
Kevin is a senior data engineer at CloudQuery, specializing in cloud data infrastructure and application development.