# Seeds

Seeds are static datasets that you can load directly into your data warehouse from CSV or Parquet files. They are useful for including reference data—like country codes, business rules, currency rates, or mapping tables—into your transformation workflows without relying on external sources.

Seeds are version-controlled, live in your repository, and are managed like any other part of your dex project. They can be queried, joined, and transformed just like regular models.

### When to Use Seeds

Seeds are ideal for:

* Lookup tables (e.g. ISO country codes, channel mappings)
* Default configuration values for business logic
* Small, rarely changing datasets that don’t come from upstream sources
* Controlled overrides or test datasets

Using seeds ensures these datasets are transparent, traceable, and tightly integrated with the rest of your pipeline.

### File Structure and Format

Seeds are stored inside a `seeds/` folder in your dex project repository. Each file represents a single table and must follow one of the supported formats:

* `.csv` – UTF-8 encoded
* `.parquet` – for typed and compressed data

#### Example project structure:

```plaintext
seeds/
  └── country_codes.csv
  └── business_unit_mapping.parquet
```

dex will automatically create a table in your warehouse with the same name as the file (minus the extension).

### Referencing Seeds in Models

You can use the `source()` function to reference a seed in your models:

```sql
select
  country_code,
  country_name
from {{ source('seeds', 'country_codes') }}

```

Seeds behave like any other table—they can be joined, filtered, and aggregated in SQL models. You can also preview them directly in dex to inspect their structure and data.

### Materialization and Deployment

When you build your project, seeds are materialized as physical tables in your warehouse. This happens once per environment and is tracked as part of the pipeline.

* Seeds are loaded into the schema defined for the project/environment
* If a seed file changes (e.g. you update a CSV), re-running the seed will reload the latest version
* Seeds are safe to use in Flows and downstream models

***

### Best Practices

* Keep seed files small—seeds are not optimized for large-scale raw ingestion
* Use version control to track changes to seed files over time
* Document seed usage and structure just like any other model
* Store seed files in the `seeds/` folder and name them clearly
* Avoid sensitive information—seed data is often accessible in plaintext

***

### Example: CSV Seed

A seed file (`channel_mapping.csv`) might look like:

```csv
csvCopyEditchannel_id,channel_name,is_active
1,Email,true
2,Facebook,true
3,TV,false
```

This table can then be referenced in models that depend on campaign performance or attribution logic.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.dexlabs.io/lakehouse-platform/develop-with-dex/seeds.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
