Layers

Modeling your data in layers is one of the most important best practices in building scalable, maintainable, and trustworthy data pipelines. By organizing models into distinct transformation stages, teams can more easily manage complexity, ensure data quality, and collaborate effectively across domains and departments.

Layering provides structure and clarity. It helps data teams isolate responsibilities (e.g. ingestion vs. business logic), trace data lineage, and deploy pipelines incrementally with confidence.

Why Use Layers?

Modularity: Each layer has a focused responsibility—making your models easier to understand and maintain.
Observability: You can track issues and errors at the right stage in the pipeline (e.g. ingestion vs. logic).
Reusability: Upstream layers can be shared across multiple use cases (e.g. cleaned tables reused in different marts).
Performance: You can apply caching, optimization, or materialization strategies differently per layer.
Governance: Different stakeholders can own different layers (e.g. Data Engineering owns bronze, Analytics owns gold).
Incremental delivery: Promote data progressively through each layer, validating logic and tests as it moves.

Common Layering Patterns

dex doesn’t enforce a single naming convention, but here are two of the most common and recommended layer structures:

Option 1: Bronze / Silver / Gold

Layers

Purpose

bronze

Raw ingested data from external sources (minimal changes)

silver

Cleaned, typed, and conformed data (joins, filters, types)

gold

Business-ready data marts used for reporting, dashboards, or machine learning

Folder structure:

models/
  └── 1.bronze/
  └── 2.silver/
  └── 3.gold/

Option 2: Raw / Cleaned / Trusted

Layers

Purpose

raw

Exact copy of source tables (e.g. staging from API/DB dumps)

cleaned

Standardized tables with logic applied (naming, filtering)

trusted

Curated datasets with validated business logic

Folder structure:

models/
  └── 1.raw/
  └── 2.cleaned/
  └── 3.trusted/

Naming Conventions

To help with sorting, readability, and execution order, it’s a good practice to prefix folders with numeric indicators:

models/
  └── 1.raw/
  └── 2.cleaned/
  └── 3.trusted/

This helps:

Sort layers logically in code editors and interfaces
Visually distinguish model responsibilities
Avoid errors in large or fast-moving projects

Layering in Practice

-- 1.raw/orders.sql
select * from {{ source('shopify', 'orders') }}

-- 2.cleaned/orders.sql
select order_id, customer_id, order_date
from {{ ref('raw__orders') }}
where order_status != 'cancelled'

-- 3.gold/orders_by_region.sql
select region, count(*) as order_count
from {{ ref('cleaned__orders') }}
group by region

Best Practices

Treat each layer as a separate responsibility
Use ref() to clearly define upstream dependencies
Apply tests between layers (e.g. row counts, integrity checks)
Document the purpose of each layer and model
Materialize layers differently based on usage (e.g. views in raw, tables in gold)

PreviousModels NextAI Copilot

Last updated 1 month ago

Was this helpful?