Introduction to Storage

Cargo workflows are powered by underlying data models, which are structured tables containing organized data. These models can be imported into a workflow to generate runs.

To ensure efficient data management, Cargo leverages a data warehouse infrastructure to host these models. This warehouse collects, manages, and stores data for each workspace, ensuring accuracy, consistency, and accessibility across all data models within that workspace.

Cargo enables users to create data models from a variety of external and internal sources, including API endpoints, webhook events, and SQL queries on warehouse tables. This integration allows users to merge multiple data sources into a unified, cohesive model.

By supporting data model creation from both external sources and existing models within Cargo, the platform provides a flexible way to consolidate diverse datasets under a common schema.

Models support several advanced features, such as:

Creating segments: Apply filters to data models to form segments, allowing for targeted workflows based on specific criteria.
Change-based triggers: Enable workflows to respond in real-time to updates, additions, or deletions in the data, ensuring action based on the most pertient signals.
Avoiding duplicate runs: Track records to prevent workflows from processing the same data multiple times.
Model relationships: Allow models to connect to each other using common identifiers, providing a unified view.
Custom columns: Use dynamic functions to create new fields that transform and enrich the data, adding calculated values or tags that enhance it.
Seting up system of record: Allow using existing warehouse instance as a system of record kept up to data by Cargo

This framework ensures that Cargo users can effectively manage their data, offering a dynamic set of options for feeding data into workflows.

Overview

What you will learn

In this module, you will learn about:

What are data models and how to use data connectors to import data from various sources into Cargo.
Set model relationships and transform data within models to enhance its utility.
Setting up segments and managing filters to trigger workflows.
What data types are available in cargo
Use custom columns to create dynamic, calculated fields that add value to your data.
Query and upsert data from within your workflows.

Data connectors

Among all of the integrations available in Cargo, those available as storage connectors allow the user to bring external data to a data model using a data loader. This capability enables you to integrate data from various sources, such as CRMs, marketing automation platforms, and e-commerce databases, into a single, unified model.

In Go-To-Market (GTM) functions, where fragmented data about prospects, customers, and their behaviors is often a limit on effective workflow creation, this is a game-changing feature. See the section on integration for more information on how this works.

There are three types of data loaders available in Cargo:

API data loader: This loader allows you to connect to an external API endpoint to fetch data and import it into a model.
Webhook data loader: This loader allows you to listen for events from an external source and import the data into a model.
SQL data loader: This loader allows you to run SQL queries on tables in the data warehouse and import the results into a model.

As webhook-based data loaders create non-standard data structure, they have their own set of rules. For instance, once a webhook-based model is created, its columns are strongly typed. For instance, if column A was configured as an integer, it cannot be changed to a string for any subsequent records.

Model relationships

Relationships between models in Cargo are useful for creating a comprehensive view of your data. By connecting different models using common identifiers, you can create merged views of your customer data from different sources.

Defining relationships involves specifying how records in one model relate to records in another model. This includes determining the type of relationship (such as one-to-one, one-to-many, or many-to-many), and mapping the identifier columns between the models. Relating models to one another can provide richer context for decision-making and workflow triggers.

For instance, consider the example of connecting Salesforce data with Stripe invoice data. By linking the 'SalesforceAccounts' model with the 'StripeInvoices' model using a common identifier such as 'CustomerID', you can create a relationship that combines CRM data with financial data. This integrated view allows you to see how customer interactions in Salesforce relate to their invoicing history in Stripe.

Filters and segment triggers

Filters and segments triggers

By applying filters to data models, users can create focused subsets of data within the larger model. For example, a filter could be created to narrow down prospects who were participants in a webinar event.

Enabling a workflow allows it to respond to real-time changes in the underlying model's data. When an appropriate change is detected, the workflow can import records from that model to create runs. A workflow will periodically resync for changes according to a schedule or a 'cron job' (except for webhook-based models, which resync automatically when a new record is added). An appropriate change is triggered whenever a record's data matches the criteria set in the filter, whether due to a record's update, addition, or removal.

Importing records into workflows based on these changes allows users to build workflows that operate autonomously. For example, in a model containing customer data, records can be imported into a workflow whenever a new customer is added or when existing customer data is updated.

Another advantage of this system is the ability to track records that enter a workflow. By maintaining a history of imported records, Cargo prevents workflows from creating multiple runs for the same record.

Filter logic can be easily reused across Cargo workspace when saved as segments. Segments are named sets of filters.

Search and upsert in models

Search and upsert model data

Within workflows, users have the ability to query data from any model in Cargo, ensuring that the data is accessible and actionable when needed.

This capability allows for one-off data retrieval operations without needing to create any merged views.

In a similar fashion, upserts to a model enable users to to push data to custom columns for existing records for transactional data, as it allows for synchronization of data across different systems.

Data types

Cargo storage data model has 6 different data types:

String - e.g. job title
Number - numeric value in integers or decimals
Boolean - true or false value
Date - full timestamp including date, time and timezone
Array - javascript array of items of any type (list of strings, numbers etc.)
Object - javascript object of nested parameters of any type ({ key: 'value' })

Once set, the type for an existing column cannot be changed. In case of an error, a new column or data model needs to be created. This solution requires all existing records to be processed again, to either ingest them in the new data model or populate the new column.

Alternatively, when historical data back-fill is difficult, a computed column with an appropriate field can be created to cast the value to the correct type.

Having the correct data type set on all columns is important for further processing. For instance, the type of a column will decide what kind of filters are available when using it for a workflow trigger. Most connectors will take care of type casting out-of-the-box, but it is important to carefully handle types for generic connectors such as HTTP webhooks.

This is a full example of how data needs to be formatted when using an HTTP connector in order to leverage all data types correctly:

{
  "string_column": "CEO",
  "number_column": 123.12,
  "boolean_column": true,
  "date_column": "2008-09-01T00:00:00.000Z",
  "array_of_strings_column": ["blue", "red"],
  "array_of_numbers_column": [1, 15, 16.1],
  "object_column": {
    "string_column": "automotive",
    "number_column": 34.33
  }
}

In order to get the same result when using an SQL connector with BigQuery:

SELECT
  'CEO' AS string_column,
  123.12 AS number_column,
  TRUE AS boolean_column,
  TIMESTAMP('2008-09-01') AS date_column,
  ['blue', 'red'] AS array_of_strings_column,
  [1, 15, 16.1] AS array_of_numbers_column,
  STRUCT(
    'automotive' AS string_column,
    34.33 AS number_column
  ) AS object_column

In the case of a Snowflake query:

SELECT
  'CEO' AS string_column,
  123.12 AS number_column,
  TRUE AS boolean_column,
  '2008-09-01T00:00:00.000Z'::TIMESTAMP_NTZ AS date_column,
  ARRAY_CONSTRUCT('blue', 'red') AS array_of_strings_column,
  ARRAY_CONSTRUCT(1, 15, 16.1) AS array_of_numbers_column,
  OBJECT_CONSTRUCT(
    'string_column', 'automotive',
    'number_column', 34.33
  ) AS object_column

Custom columns

Cargo allows users to create custom columns. These columns let you transform data within your models by adding calculated fields or tags, enhancing the utility of your data. For example, you can categorize customers based on their purchase frequency or calculate their lifetime value.

You can use this to apply dynamic functions to your data models, creating new fields that aren't part of the original dataset. These columns can be used to add calculated values or to append tags that categorize data dynamically based on specific criteria. For instance, you might create a custom column that categorizes customers based on their purchase frequency.

There are three types of custom columns that you can create in Cargo:

Custom columns

These columns are free-form columns that you can upsert data into from within a workflow. For example, you could create a custom column called last_score_at to keep track when scoring was performed for a given record and use that information to re-enrol records for additional evaluation in the future.

Computed columns

These columns are created using storage that calculate values based on other columns in the model. For example, you could create a computed column that classifies customers based on their job title and years of experience whether they are senior or not. The value is computed using Cargo's expression syntax and can only consider other columns of the same data model. For relation based columns use metrics columns.

Metrics columns

These columns are used to store aggregated data, such as sums, averages, or counts, based on the values in other columns in other models. For example, you could create a metrics column inside the Salesforce data model that calculates the total revenue generated by each customer from the Stripe data model.

Guides

To help you get started, we have included practical guides that demonstrate how to use storage features effectively.

For example, learn how to set up data loaders, transform data using custom columns, and manage data queries and upserts. See here for instruction on connecting your Google BigQuery or Snowflake data warehouse to Cargo.

These guides provide step-by-step instructions to enhance your understanding and application of storage in real-world scenarios.