Seeds
Seeds are static datasets that you can load directly into your data warehouse from CSV or Parquet files. They are useful for including reference data—like country codes, business rules, currency rates, or mapping tables—into your transformation workflows without relying on external sources.
Seeds are version-controlled, live in your repository, and are managed like any other part of your dex project. They can be queried, joined, and transformed just like regular models.
When to Use Seeds
Seeds are ideal for:
Lookup tables (e.g. ISO country codes, channel mappings)
Default configuration values for business logic
Small, rarely changing datasets that don’t come from upstream sources
Controlled overrides or test datasets
Using seeds ensures these datasets are transparent, traceable, and tightly integrated with the rest of your pipeline.
File Structure and Format
Seeds are stored inside a seeds/
folder in your dex project repository. Each file represents a single table and must follow one of the supported formats:
.csv
– UTF-8 encoded.parquet
– for typed and compressed data
Example project structure:
seeds/
└── country_codes.csv
└── business_unit_mapping.parquet
dex will automatically create a table in your warehouse with the same name as the file (minus the extension).
Referencing Seeds in Models
You can use the source()
function to reference a seed in your models:
select
country_code,
country_name
from {{ source('seeds', 'country_codes') }}
Seeds behave like any other table—they can be joined, filtered, and aggregated in SQL models. You can also preview them directly in dex to inspect their structure and data.
Materialization and Deployment
When you build your project, seeds are materialized as physical tables in your warehouse. This happens once per environment and is tracked as part of the pipeline.
Seeds are loaded into the schema defined for the project/environment
If a seed file changes (e.g. you update a CSV), re-running the seed will reload the latest version
Seeds are safe to use in Flows and downstream models
Best Practices
Keep seed files small—seeds are not optimized for large-scale raw ingestion
Use version control to track changes to seed files over time
Document seed usage and structure just like any other model
Store seed files in the
seeds/
folder and name them clearlyAvoid sensitive information—seed data is often accessible in plaintext
Example: CSV Seed
A seed file (channel_mapping.csv
) might look like:
csvCopyEditchannel_id,channel_name,is_active
1,Email,true
2,Facebook,true
3,TV,false
This table can then be referenced in models that depend on campaign performance or attribution logic.
Last updated
Was this helpful?