Layers
Modeling your data in layers is one of the most important best practices in building scalable, maintainable, and trustworthy data pipelines. By organizing models into distinct transformation stages, teams can more easily manage complexity, ensure data quality, and collaborate effectively across domains and departments.
Layering provides structure and clarity. It helps data teams isolate responsibilities (e.g. ingestion vs. business logic), trace data lineage, and deploy pipelines incrementally with confidence.
Why Use Layers?
Modularity: Each layer has a focused responsibility—making your models easier to understand and maintain.
Observability: You can track issues and errors at the right stage in the pipeline (e.g. ingestion vs. logic).
Reusability: Upstream layers can be shared across multiple use cases (e.g. cleaned tables reused in different marts).
Performance: You can apply caching, optimization, or materialization strategies differently per layer.
Governance: Different stakeholders can own different layers (e.g. Data Engineering owns bronze, Analytics owns gold).
Incremental delivery: Promote data progressively through each layer, validating logic and tests as it moves.
Common Layering Patterns
dex doesn’t enforce a single naming convention, but here are two of the most common and recommended layer structures:
Option 1: Bronze / Silver / Gold
bronze
Raw ingested data from external sources (minimal changes)
silver
Cleaned, typed, and conformed data (joins, filters, types)
gold
Business-ready data marts used for reporting, dashboards, or machine learning
Folder structure:
models/
└── 1.bronze/
└── 2.silver/
└── 3.gold/
Option 2: Raw / Cleaned / Trusted
raw
Exact copy of source tables (e.g. staging from API/DB dumps)
cleaned
Standardized tables with logic applied (naming, filtering)
trusted
Curated datasets with validated business logic
Folder structure:
models/
└── 1.raw/
└── 2.cleaned/
└── 3.trusted/
Naming Conventions
To help with sorting, readability, and execution order, it’s a good practice to prefix folders with numeric indicators:
models/
└── 1.raw/
└── 2.cleaned/
└── 3.trusted/
This helps:
Sort layers logically in code editors and interfaces
Visually distinguish model responsibilities
Avoid errors in large or fast-moving projects
Layering in Practice
-- 1.raw/orders.sql
select * from {{ source('shopify', 'orders') }}
-- 2.cleaned/orders.sql
select order_id, customer_id, order_date
from {{ ref('raw__orders') }}
where order_status != 'cancelled'
-- 3.gold/orders_by_region.sql
select region, count(*) as order_count
from {{ ref('cleaned__orders') }}
group by region
Best Practices
Treat each layer as a separate responsibility
Use
ref()
to clearly define upstream dependenciesApply tests between layers (e.g. row counts, integrity checks)
Document the purpose of each layer and model
Materialize layers differently based on usage (e.g. views in raw, tables in gold)
Last updated
Was this helpful?