Setting up Cargo on Google BigQuery
In this guide, we will walk you through setting up Google BigQuery as your store of records in Cargo. This setup ensures Cargo has the necessary permissions in Google Cloud to read and write data efficiently.
As a result, BigQuery will serve as a persistence layer that Cargo will use when running automations.
Permissions
To prevent data loss, Cargo has a few limitations on what it can and cannot do within your Google BigQuery instance.
What Cargo can do:
Read data from schemas and tables, even if they are spread across multiple databases
- Create and write data into new schemas and tables
What Cargo will never do:
Overwrite existing schemas and tables (Cargo always creates its own schemas and tables when needed)
Before you begin
To start, you need an existing Google Cloud Project with a payment method and billing enabled. Follow the official Google guide.
Once you have created the project, you can continue with this guide which will cover enabling and creating the necessary elements in your new GCP project:
- BigQuery API & Cloud Resource Manager API
- BigQuery dataset (dedicated to Cargo)
- Object Storage Bucket (dedicated to Cargo)
- Service Account
If you have an existing BigQuery project and technical knowledge, you may skip any of these steps.
Cargo uses two Google APIs that must be enabled. To do so:
- Go to the Google Cloud Console.
- Select
APIs & Services
. - Select
Enabled APIs & Services
. - Search for and enable the following APIs:
BigQuery API
,Cloud Resource Manager API
Create a storage bucket
To enable Cargo to load and unload data from BigQuery, we need a dedicated storage bucket for this purpose.
To create a new bucket:
- Go to the
Google Cloud Console
. - Search
Object storage
in the search bar. - Create a
new bucket
and follow the steps.
Create a dataset
As mentioned above, Cargo won't write data to anywhere else than to a selected dataset to ensure no data-loss happens. It is recommended to create a new dataset dedicated for Cargo.
To create a new dataset:
- Go to the
Google Cloud Console
. - Search
BigQuery
in the search bar. - In
BigQuery Studio
click three dots next to the project name - Click
Create dataset
- Follow the setsp, keep a note of the dataset name and location
Create a Service Account
Cargo will use this service account to access the APIs enabled in the previous step. To create a service account, follow these steps:
- Go to the
Google Cloud Console
. - Click on
IAM & Admin
. - Click on
IAM
. - Click on
Service Accounts
. - Click on
Create service account
. - Give the service account a name.
Grant the following roles:
BigQuery Data Editor
,BigQuery Job User
,Storage Object User
- Click on
Done
.
Generate a Key
Follow the steps below to generate a key:
- In Service Accounts, click on the created service account.
- Click on
Keys
. - Click on
Add Key
. - Choose
Create new key
. - Select
JSON
. - Click on
Create
. - This should download a file to your computer.
Setup system of records
Now that we have all required elements, navigate to workspace settings and select "System of records".
Fill in the settings form with the data we gathered in previous steps:
Copy and paste the content of the service account key file into the field labeled
Service Account
- Select the location that was chosen during step 3
- Fill in the name of the bucket created in step 2
- Select Dataset as Scope
- Fill in the name of the BigQuery dataset created in step 3
- Click
Setup
Setup completed
You are now ready to use Cargo with Google BigQuery!