Snapshots

Snapshots in dex allow you to track how data changes over time. They are especially useful for capturing Slowly Changing Dimensions (SCD), auditing historical records, and preserving point-in-time data for comparison or analysis.

While models define the current state of data, snapshots preserve historical versions by comparing incoming records with previously captured ones. When a change is detected, dex records a new version, allowing you to build a complete change history.

When to Use Snapshots

Use snapshots when you need to:

  • Track changes to customer, product, or order records over time

  • Build Slowly Changing Dimensions (Type 2)

  • Monitor key field changes in a source table

  • Create point-in-time views for analytics or regulatory compliance

  • Detect and store updates without overwriting previous values

How Snapshots Work

Each time a snapshot runs, dex:

  1. Queries the source data

  2. Compares each record against the previous version (based on unique keys and fields to track)

  3. Inserts a new row only if a tracked field has changed

  4. Adds metadata columns like dbt_valid_from, dbt_valid_to, and dbt_scd_id to enable temporal queries

This allows you to write queries like “what did this record look like last week?” or “how many customers changed plans in Q1?”

Defining a Snapshot

Snapshots are stored in the snapshots/ folder of your dex project and written using SQL.

Example

{% snapshot customer_snapshot %}

{{
  config(
    target_schema='snapshots',
    unique_key='customer_id',
    strategy='check',
    check_cols=['email', 'plan']
  )
}}

select
  customer_id,
  email,
  plan,
  created_at
from {{ source('crm', 'customers') }}

{% endsnapshot %}

Explanation:

  • unique_key: Identifies each record (e.g., customer_id)

  • strategy: Defines how changes are detected

    • check: Compares specific columns for changes

    • timestamp: Compares a column value like updated_at

  • check_cols: Columns to monitor for changes

Materialization and Execution

Snapshots are materialized as tables in your warehouse and are updated incrementally over time. Each row represents a valid version of a record, along with timestamps that define its validity range.

Snapshots can be:

  • Run manually from the Develop tab

  • Triggered in Flows

Referencing Snapshots

You can query snapshots just like models:

select *
from {{ ref('customer_snapshot') }}
where dbt_valid_to is null

This gives you the most recent version of each record. To analyze historical changes, you can use dbt_valid_from and dbt_valid_to for time-based filtering.

Snapshot Folder Structure

All snapshots should be stored in the /snapshots/ directory in your dex project repo.

snapshots/
  └── customer_snapshot.sql
  └── subscription_status.sql

Each file defines one snapshot and can be accompanied by documentation and tests.

Best Practices

  • Use snapshots only for datasets that are mutable (i.e., values can change over time)

  • Monitor the size of snapshot tables—over time, they can grow large

  • Partition or archive old snapshot data if needed

  • Track only the columns that matter to reduce noise and improve performance

  • Combine with tests to detect unexpected or invalid changes

Last updated

Was this helpful?