Sync Sources

Sync Sources

Sync Sources enable users to access external datasets available in their cloud storage—even if those datasets were not ingested using native dex Connections.

This functionality ensures dex can operate seamlessly in environments where data may already exist in buckets, databases, or warehouses managed outside of dex’s ingestion engine.

Why Use Sync Sources?

  • Leverage datasets that were uploaded, replicated, or generated externally.

  • Reference tables from external pipelines, data lakes, or manually loaded storage locations.

  • Automatically configure syncs for datasets ingested using dex native connectors.

How to Configure a Sync Source

Configuring Sync Sources

To configure a new sync source:

  1. Navigate to the Sync Sources menu in the left sidebar.

  2. Click + Add to create a new Sync Source.

  3. Provide the following information:

    • Environment: The environment (e.g. dev, staging, prod) in which this dataset exists.

    • Dataset Name: The exact name of the dataset in your cloud storage. Make sure it matches the naming in your warehouse or lake.

  4. Click Save.

dex will search for this dataset in the default project or bucket defined in your environment's configuration.

Syncing the Source

Once configured, you’ll need to sync the source so it becomes available during development:

  1. Open the Develop menu in the sidebar.

  2. Inside the Explorer tab, locate the Sync Sources button at the top.

Sync Sources button
  1. A panel will display all configured Sync Sources and their sync status:

    • 🟢 Green dot = Source is synced and ready to use.

    • 🟡 Yellow dot = Source is not yet synced.

  2. Click any unsynced source to trigger a sync.

dex will automatically generate a .yml file describing the dataset schema, including:

  • Metadata

  • Identifiers

  • List of tables

This YAML file is the last required asset to allow referencing the source in your transformation models.

Example .yml file

Native Connectors Sync Automatically

If your dataset was ingested through a dex native connector, the Sync Source will be:

  • Automatically created

  • Automatically synced

  • Instantly usable in your transformation layer

Referencing a Source

Once the sync completes, you can use the source() function inside your SQL or Python models like so:

select * from {{ source('my_dataset', 'orders') }}

For more details on how dex connects to various databases, files, and APIs, continue reading Accessing Data Sources.

Last updated

Was this helpful?