Use Cargo on AWS Redshift
✅ What Cargo can do
Read data from schemas and tables, even if they are spread across multiple databases,
Writes them into new schemas and tables
❌ What Cargo will never do
Overwrite existing schemas and tables
(instead it always creates its own schemas and tables when needed)
How it works: Cargo uses files stored in S3 and imports the data of these files using the COPY
command to write data into the database
Case: You use Cargo’s computed column feature in your entity. Cargo creates an external function on Redshift calling a Lambda on your AWS. The lambda runs the Javascript code you define for the column in Cargo
Disclaimer: Cargo does not currently support SSH. Therefore, Redshift can only be accessed through private or internal networks. To be accessible, it must be made publicly available
Create a dedicated DB for Cargo needs
All data managed by Cargo will be stored and transformed in a dedicated database on your Redshift cluster
Create a DB user for Cargo
In order for Cargo to run commands, it must be authenticated as a Redshift user with the necessary permissions on the database you just created above
Allow Cargo IP
Depending on the configuration of Redshift, it may be necessary to whitelist Cargo's public IP address
To whitelist the IP:
Open the VPC security group used for Redshift
For Redshift Serverless, follow these steps to find the security group:
Go to Workgroup configuration
Go to Network and configuration
Click on VPC security group
Click on Inbound rules
Click on Edit inbound rules
Click on Add a new rule
Set the following values:
All traffic
Source custom
CARGO IP
At the moment, Cargo does not support SSH. Therefore, Redshift cannot only be accessed through private or internal networks. To be accessible, it must be made publicly available
DB connection
Cargo will use this information to connect to the Redshift database:
Hostname - Port - Database name
For Redshift Serverless:
Select the workgroup to see the workgroup configuration
Copy the endpoint value
Extract the hostname, port, and database name from the endpoint value.
The endpoint has the following format:
hostname:port/database_name
User - Password
This refers to the user created earlier.
Cross account access
Read me: Cargo uses the COPY
and UNLOAD
commands to efficiently load and export data. These commands are executed from your Redshift, but read or write data to Cargo's S3 bucket
To enable this functionality, your Redshift cluster needs access to Cargo's S3 bucket. This can be achieved by setting up cross-account access
Once you submit the form to create the workspace, Cargo create a role with the necessary access to their S3 bucket for you
To complete the setup, you need to create a policy and a role on AWS that is connected to this role. Then, attach the role to your Redshift cluster (or namespace if using Redshift Serverless), and provide us with the ARN of the role.
The steps are detailed in this document, but you can also refer to the full AWS documentation here.
Create a policy
Open the IAM console.
Select Policies and click Create policy.
Choose JSON and copy the following policy. Make sure to replace
<cargo_role_arn>
with the ARN of the role created by Cargo for you.Click Next, set a meaningful name, and create the policy.
Create a role
Open the IAM console.
Choose Policies and click Create role.
Choose AWS service as Trusted entity type, and select Redshift - Customizable as Use case.
Click Next.
Select the policy created in the previous step, and click Next.
Set a meaningful name
Create the role.
Give Redshift access to AWS Lambda
To compute the values of computed columns using external functions, Redshift need access to AWS Lambda.
Go back to IAM console
Find the role
Select the tab Trust relationships
Click on Edit trust policy and add the following statement
Click on update policy
Now choose the permissions tab
Click on Add permissions, Attach policies
And attach the following AWS policies:
AWSLambdaExecute
AWSLambdaRole
Attach the role to Redshift
For Redshift Serverless
Select the namespace of your Redshift.
Open the Security and encryption tab.
Click on Manage IAM roles.
Click on Manage IAM roles again and select Associate IAM roles.
Select the role created earlier, and click on Associate IAM roles
Lambda Functions
What we use it for: In Cargo, you can create computed columns on entities. It allows you to generate the column value based on Javascript code. The column values are computed on the Redshift side, in a user-defined function, when we fetch the data to display it in Cargo. But Javascript code can't run on Redshift directly.
The solution to the problem is to put the Javascript code inside a Lambda function, and create a user-defined function calling this Lambda function. For more details, see this AWS documentation.
Which means Cargo needs to be able to create Lambda functions in the same AWS account as Redshift.
To do this, you have to create a user for us with the Lambda policies, generate credentials, and copy/paste these credentials in the Redshift setup.
Open the IAM console once again.
Select users
Click create user
Give it a name, like "cargo_lambda"
Click on next
Select Attach policies directly
Check AWSLambda_FullAccess
(we are working to improve this part and ask for less permissions)
Click next, and create the user
The user has been created
Select the created user in the users list
Open the Security credentials tab
Scroll down to the Access keys section
Click on Create access key
Choose Third-party service and check the confirmation checkbox
Click next
Add a description, like "Access key for Cargo Redshift setup"
Click Create access key
Copy the Access key in the accessKeyId field, and the Secret access key in the secretAccessKey field on Cargo.
Setup completed 🎉
You are ready to use Cargo!
Last updated