Models

Models are the foundation of any development workflow in dex. A model represents a transformation step in your data pipeline—typically written as a SQL SELECT statement or a Python script. Models define how raw or intermediate data should be cleaned, reshaped, or joined to produce analytics-ready outputs.

Creating Models

What is a Model?

A model is a file that contains logic for transforming data. In dex, models are version-controlled files saved in your repository with a .sql or .py extension. Each model should represent a single transformation step and is designed to be simple, composable, and reusable.

You can write a model to:

  • Clean and standardize raw inputs

  • Join multiple datasets

  • Aggregate data for reporting

  • Filter, rename, or reformat columns

  • Output the result into a physical table or view

SQL models are most common and follow a clear structure: they contain a single SELECT statement that defines the output of the transformation. Python models are also supported and are ideal for advanced logic, machine learning steps, or integrating custom libraries.

Creating a Model

To create a new model in dex, follow the steps below:

  1. Navigate to the Develop menu from the left-hand sidebar.

  2. In the Explorer tab, browse to the folder where you want to add the model. You can use one of the layer folders such as 1.raw, 2.cleaned, or 3.trusted.

  3. Click on + Add file, or right-click on the folder and select New File.

  4. Give your model a name using only lowercase letters and underscores (e.g., payments_analysis or order_payments). Avoid using capital letters, spaces, or special characters.

  5. Be sure to use the correct file extension:

    • Use .sql for SQL-based models

    • Use .py for Python-based models

  6. Once the file is created, click on it to open the code editor on the right panel.

  7. Start writing your model logic directly in the editor. You can use SQL SELECT statements, Jinja templating, or Python functions depending on the model type.

After saving your changes, you can preview, run, and validate the model in place.

File Structure

Models are stored inside your project repository under the models/ directory. You can organize them into subfolders (e.g., raw/, staging/, marts/, finance/, etc.) to reflect your team's workflow or business domain.

models/
  └── raw/
      └── orders.sql
  └── staging/
      └── customers.sql
  └── marts/
      └── revenue_by_region.sql

Inspecting Existing Tables

Table Description

Before referencing a table in a new model, you can inspect existing tables within your workspace to better understand their structure and determine whether they fit your use case.

To do this, navigate to the Explorer tab in the Develop section. Inside the file tree, right-click on any .sql file representing a model and select Table Description from the context menu.

This will open a right-hand side panel with detailed information about the table, including:

  • Table name and last modified time

  • Storage metadata (partitioning, clustering, size, etc.)

  • Schema including field names, types, and nullability

  • A preview of the raw data

  • Execution logs

  • Lineage visualization for tracking upstream and downstream dependencies

This feature lets you explore datasets directly within dex without needing to write queries manually or switch tools.

Additionally, the right-click menu offers other useful actions:

  • Copy as ref: Copies a ref() statement to your clipboard, e.g., {{ ref('customers_orders') }}

  • Rename: Allows renaming the model file

  • View Lineage: Opens a graphical lineage view of the model’s dependencies

  • Reload Content: Refreshes the model content to reflect the latest saved version

Use these tools to quickly inspect, reference, and reuse existing transformations in your modeling workflow.

Dependencies and Lineage

Models can reference each other using the ref() function. This function establishes a dependency between models and ensures they are built in the correct order.

select *
from {{ ref('transactions') }}

When modeling data in dex, it's important to use the appropriate reference function:

  • ref() is used to reference another model inside your project. It ensures proper dependency resolution and enables dex to build DAGs automatically.

    Example:

    {{ ref('customers_orders') }}
  • source() is used to reference external datasets that are not defined as models (e.g., raw tables in your warehouse). It is used primarily in the bronze layer or when ingesting data.

    Example:

    {{ source('dex-dsm-ecommerce_landing', 'orders') }}

Use ref() for internal model dependencies and source() to bring in raw, external data.

Model Naming Conventions

To keep your project organized and readable, follow clear and consistent naming conventions. Here are some suggestions:

  • Prefix models by layer: raw_, stg_, int_, fct_, dim_

  • Use lowercase and underscores

  • Reflect the business logic in the name (e.g. fct_orders, dim_products, int_revenue_by_region)

Example SQL Model

{{ config(materialized='table') }}

select
  order_id,
  customer_id,
  order_date,
  total_amount
from raw_orders
where order_status = 'completed'

Model Materializations

By default, models in dex are materialized as views, meaning the SQL logic is re-executed each time the data is queried. You can change the materialization strategy to control how and where the results are stored. Supported options include:

  • view: The model is created as a logical view in your warehouse.

  • table: The result is stored as a physical table and refreshed when the model runs.

  • incremental: The model updates only new or changed data, ideal for large datasets.

  • ephemeral: The model is not written to the warehouse and is inlined as a CTE in downstream models.

You can specify the materialization per model using the config block:

{{ config(materialized='table') }}

select
  customer_id,
  sum(total) as total_spent
from orders
group by customer_id

Best Practices

  • Keep each model focused on a single transformation

  • Use consistent formatting, indentation, and comments

  • Add descriptions and column-level documentation (see Documentation)

  • Leverage tests to validate data assumptions (see Tests)

  • Use ref() instead of hardcoding table names

Last updated

Was this helpful?