⚙️Advanced concepts
Leverage the semantic layer
Cargo is a headless platform. It plugs into any system of record (BigQuery, Snowflake, Postgres, …) and never duplicates/stores data on its side. Here are the benefits of not owning the storage for data teams:
Data governance - You keep the control of the data. Your data is not spread in another system. You can easily plug/unplug Cargo without losing your data.
Data access - All changes you are making in your models (with DBT / lookML) are automatically updated into Cargo. You don’t have to struggle with data syncs across different systems. On the other hand, you can use Cargo data in your BI tools
Cost efficient - You are not paying several time the same record.
How it technically works
As mentioned above, all your data stays in your data warehouse. To do so, a dedicated database needs to be created where you will give to Cargo a read and write access. Cargo is storing data in dedicated schemas and tables in the database.
Entity
An entity is a data object that you want to leverage in Cargo. Most of the time, it’s going to be a Contact
, a Company
or an Event
.
When you create an entity in Cargo, we are going to ask you to write the SQL query that define this entity.
From then, we are creating a table called ENTITIES_{UUID}.DEFAULT
with the results of this sql query. This table will be updated on a regular basis (defined in the settings)
Like in a CRM, a user is able to create custom columns for each entity. To do so, we don’t override data of the DEFAULT
table but we are storing the values in another table called ENTITIES_{UUID}.CUSTOM
Segment
Users can filter entities with our query builder. They then can save it as a segment. The portion of the segment will be stored in a table called SEGMENTS_{UUID}.DEFAULT
and we will store the changes (new records added, removed, updated) in a table called SEGMENTS_{UUID}.CHANGES_{UUID}
Data stream ingestion
As you read above, with custom columns, we have the ability to write back data in the warehouse. To do so, we are not directly updating the table in the warehouse but we have built a pipeline to load the new data in batches.
Data pushed to firehose
Data stored in batches in a S3 (can be yours)
Batches loaded in your warehouse in a staging table (no cost)
Records dispatched in the right tables during the next sync.
Last updated