Seeds

Seeds are static datasets that you can load directly into your data warehouse from CSV or Parquet files. They are useful for including reference data—like country codes, business rules, currency rates, or mapping tables—into your transformation workflows without relying on external sources.

Seeds are version-controlled, live in your repository, and are managed like any other part of your dex project. They can be queried, joined, and transformed just like regular models.

When to Use Seeds

Seeds are ideal for:

  • Lookup tables (e.g. ISO country codes, channel mappings)

  • Default configuration values for business logic

  • Small, rarely changing datasets that don’t come from upstream sources

  • Controlled overrides or test datasets

Using seeds ensures these datasets are transparent, traceable, and tightly integrated with the rest of your pipeline.

File Structure and Format

Seeds are stored inside a seeds/ folder in your dex project repository. Each file represents a single table and must follow one of the supported formats:

  • .csv – UTF-8 encoded

  • .parquet – for typed and compressed data

Example project structure:

seeds/
  └── country_codes.csv
  └── business_unit_mapping.parquet

dex will automatically create a table in your warehouse with the same name as the file (minus the extension).

Referencing Seeds in Models

You can use the source() function to reference a seed in your models:

select
  country_code,
  country_name
from {{ source('seeds', 'country_codes') }}

Seeds behave like any other table—they can be joined, filtered, and aggregated in SQL models. You can also preview them directly in dex to inspect their structure and data.

Materialization and Deployment

When you build your project, seeds are materialized as physical tables in your warehouse. This happens once per environment and is tracked as part of the pipeline.

  • Seeds are loaded into the schema defined for the project/environment

  • If a seed file changes (e.g. you update a CSV), re-running the seed will reload the latest version

  • Seeds are safe to use in Flows and downstream models


Best Practices

  • Keep seed files small—seeds are not optimized for large-scale raw ingestion

  • Use version control to track changes to seed files over time

  • Document seed usage and structure just like any other model

  • Store seed files in the seeds/ folder and name them clearly

  • Avoid sensitive information—seed data is often accessible in plaintext


Example: CSV Seed

A seed file (channel_mapping.csv) might look like:

csvCopyEditchannel_id,channel_name,is_active
1,Email,true
2,Facebook,true
3,TV,false

This table can then be referenced in models that depend on campaign performance or attribution logic.

Last updated

Was this helpful?