Quickstart dex with AWS

Get started building data pipelines using AWS, S3 and Athena

In this guide, you'll learn how to set up dex with Amazon Web Services (AWS) using Athena and S3. We'll walk you through:

Creating an AWS account and S3 bucket
Assigning required IAM roles and policies
Connecting dex to Athena and S3
Setting up GitHub or GitLab for version control
Accessing sample data stored in S3
Writing your first SQL model
Adding tests and documentation
Automating your data pipelines

By the end, you’ll have a fully operational dex project running in your own AWS environment.

Step 1: Prerequisites

Before you begin, make sure you have:

AWS account and access
A dex account (sign up here)
A GitHub or GitLab account for code management

Step 2: Get ready to configure dex

Log in to dex
Change your temporary password, if asked
Select Set Up with AWS

Step 3: Set Up your AWS Resources

dex provides a pre-configured CloudFormation template that automates cloud configuration. This guide walks you through each step of the setup.

To establish a secure and functional integration between dex and your AWS account, the following resources need to be created:

Two dedicated S3 buckets: one for storing data, the other for metadata
Two Athena workgroups: one for production, and another for development
One IAM user or role: to allow dex to write data to your cloud storage

Before running the CloudFormation template, confirm your AWS region. This determines where your buckets will be created. If your organization doesn’t have a preferred region, we recommend using us-east-1.

Access the CloudFormation Console

Step 1: Create the Stack

Go to the AWS CloudFormation Console and click Create Stack.
Under Prepare template, select Choose an existing template.
For Template source, choose Amazon S3 URL.
Enter the following S3 URL for the deX CloudFormation template:

https://dex-public-assets.s3.us-east-1.amazonaws.com/cloudformation/dex_roles.yaml

This will load the dex automation script and prepare it for configuration.

Step 2: Specify Stack Details

In Stack name, enter a name to identify this setup—for example, dex-setup.

Under Parameters:

CreateDexUser: Select True to create an IAM user in your AWS account. This user will be used by dex Automation to write data to your cloud. These credentials will never be stored by deX.
InstallationID: Enter the unique ID displayed in your deX workspace during the cloud connection step

If you choose to create the IAM manually, contact dex Support after setup to complete the integration

Step 3: Configure Stack Options

This step allows for advanced configuration, but for most users, the defaults are sufficient. However, we recommend reviewing these two settings to ensure a clean rollback in case of failure.

Under Stack failure options:

Set Behavior on provisioning failure to Roll back all stack resources
Set Delete newly created resources during a rollback to Use deletion policy

Step 4: Review and Create

Review all the information entered in the previous steps. Once confirmed, click Create Stack to begin provisioning the resources.

If your stack creation fails, reach out to dex Support for assistance.

Step 5: Finalize the Setup in deX

Go to the Outputs tab in your new stack and locate the Access Key ID and Secret Access Key. Copy and paste these credentials into the corresponding fields in the dex platform to complete the connection.

Step 4: Set up your Github/Gitlab account

dex uses Git for version control, CI/CD, and collaboration. You can choose to set up with GitHub or GitLab. We'll guide you through the GitHub setup since most of our customers are familiar with it.

Create a Personal GitHub Account

Skip this if you already have a GitHub account.

Go to GitHub
Click Sign Up
Verify your email

Create an Organization

Go to Your Organizations in the profile dropdown
Click New Organization
Choose a plan (Free or Team) and click Create Organization

Create an Empty Repository

In your new organization, go to the Repositories tab
Click New Repository
Fill in:
1. Repository name (e.g. dex-analytics)
2. Optional description
3. Set to Private
Do not add a README—repo must be empty

Important: The repository must be brand new and empty (no commits or README).

Create a Git Access Token

Go to Settings > Developer Settings > Personal Access Tokens
Click Generate new token (classic)
Configure the token:
- Name: dex_lakehouse_access_token
- Expiration: Optional
- Scopes: Select repo
Click Generate Token
Copy and save the token—input this token in the dex setup

If you prefer to limit access to specific repositories, use a fine-grained personal access tokens token instead of a classic token.

Step 5: Set up your first Connection

In this example, we’ll connect to a sample Postgres database.

In the left-hand menu, go to Connection > New Source

Select Postgres from the connector catalog
Give your connection a name, like Demo Postgres Database Connection
Click Next, then enter the following credentials:
1. Host: dex-trial.db.aws.dexlabs.io
2. Port: 5432
3. User: trial_user
4. Database: postgres
5. Password: %7R^RT4N#h#WdjpU2#or@W5
6. SSH Tunnel: Select Don’t Use SSH Tunnel
Click Test and then Save
Wait a few seconds while dex tests the connection
On the next screen, toggle all datapoints on
Click Run to execute a manual data replication
Click on the Runs tab to check your replication status

Step 6: Build your first model

Now that we have raw data connected, we can start modeling it. In this example, we'll organize data into two layers: raw and cleaned. Most organizations build additional layers (like trusted, analytics, or mart) on top of these, but this will give us a solid foundation.

Create Your Model Folders

In the left-hand menu, go to Develop
Right-click the Models folder and create two subfolders:
- 1.raw
- 2.cleaned

Create Your First Model: `customers.sql`

Right-click on the 1.raw folder and create a new file: customers.sql
Paste the following code into the file (update the from clause with your copied source):

    select
        customer_state as customer_state,
        customer_unique_id as customer_unique_id,
        regexp_extract(customer_id,r'^.{0,3}') as customer_id,
        customer_city as customer_city,
        customer_zip_code_prefix as customer_zip_code_prefix
    from 
        <your_copy_as_source_here>

If you encounter any errors related to the data source configuration, refer to this section in the documentation
Click Save
Click Preview and Run to see the results

Add a Second Model: `orders.sql`

Right-click 1.raw again and add a new file: orders.sql

Paste in the following query:


SELECT
    order_id,
    customer_id,
    order_status,
    order_purchase_timestamp,
    order_approved_at,
    order_delivered_carrier_date,
    order_delivered_customer_date,
    order_estimated_delivery_date
FROM 
    <your_copy_as_source_here>

If you encounter any errors related to the data source configuration, refer to this section in the documentation
Click Save
Click Preview and Run to see the results

Add a Third Model: `order_payments.sql`

Right-click 1.raw again and create: order_payments.sql

Paste in the following code:

    select
        payment_type as payment_type,
        payment_value as payment_value,
        payment_installments as payment_installments,
        payment_sequential as payment_sequential,
        order_id as order_id
    from 
        <your_copy_as_source_here>

If you encounter any errors related to the data source configuration, refer to this section in the documentation
Click Save
Click Preview and Run to validate the model

Create a Cleaned Model: `customer_orders.sql`

Right-click on 2.cleaned and create a file called customer_orders.sql
Add the following code to join data across the three models:

{{
  config(
    tags=['finance']
  )
}}

with 

orders_info as (
    select
    order_id as order_id,
    customer_id as customer_id,
    order_purchase_timestamp as order_date,
    order_status as order_status
    from 
    {{ ref ('orders') }}
),

payments_info as (
    select
        payment_type as payment_type,
        payment_value as payment_value,
        payment_installments as payment_installments,
        order_id as order_id
    from 
        {{ ref('order_payments') }}
),

customer_info as (
    select
        customer_id as customer_id,
        customer_city as customer_city
    from
        {{ ref('customers') }}
)

select
    o.order_id,
	o.customer_id,
    ci.customer_city,
    py.payment_value,
    py.payment_type,
    py.payment_installments,
    o.order_date,
    o.order_status
from
orders_info o
left join
payments_info py on py.order_id = o.order_id
left join    
customer_info ci on ci.customer_id = o.customer_id

Click Save
Click Preview and Run to see the final output

Read more about Models in the Develop with dex page

Step 7: Change the way your model is materializaed

One of the most powerful features of dex is the ability to control how models are materialized in your warehouse—without changing SQL. With a single configuration value, you can switch models from views to tables, and vice versa.

This gives you the flexibility to optimize performance and cost, while keeping your modeling layer clean and focused on business logic.

By default, all models are materialized as views. You can override this at the directory level, so every model inside that folder uses a different materialization strategy.

Update Your Project Configuration

In your file explorer, open the dbt_project.yml file at the root of your project
Update the project name (line 5) to: dex_lakehouse
Define how your 1.raw and 2.cleaned models should be materialized as tables by adding this configuration under models: (line 28):

...

# Configuring models
# Full documentation: https://dex-labs-1.gitbook.io/wip-dex-docs/project-settings-and-defaults

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  dex_lakehouse:
    # Config indicated by + and applies to all files under models/example/
    example:
      +materialized: view
    1.raw:
      +materialized: table
      +schema: raw
    2.cleaned:
      +materialized: table
      +schema: cleaned

Save the file
If you want to override materialization for a specific model, you can do it inline at the top of the file using the config() block:

{{
  config(
    materialized='view'
  )
}}

Read more about materialization in the Materialization page

Step 8: Add tests to your models

Right-click the 2.cleaned folder in the Explorer panel and select New File.

Name the file: customer_orders.yml
Paste the following content into the file:

version: 2

models:
  - name: customer_orders
    description: "Joined dataset combining order, payment, and customer information."
    columns:
      - name: order_id
        description: "Unique identifier for each order"
        tests:
          - not_null
          - unique

What This Does

Model-level metadata: You are describing the customer_orders model.
Column-level tests:
- not_null: Ensures every row has an order_id
- unique: Ensures no duplicate order_id exists

Once you've saved the .yml file:

Run the model again.
The tests will automatically execute right after the model builds.
If a test fails, the flow status will change to Failed, and a notification will be sent to the Notification Center for review.

1. Open the Git Tab

Navigate to the Git tab from the left-hand Develop menu. dex will list all the files you've created or modified since your last commit. In this case, you should see the following:

models/1.bronze/customers.sql
models/1.bronze/orders.sql
models/1.bronze/order_payments.sql
models/2.silver/customer_orders.sql
models/2.silver/customer_orders.yaml
dbt_project.yml

2. Stage Your Changes

Review the file list and click the checkbox next to each file you'd like to include in the commit. Click on any filename to open a diff view that highlights the changes made—ideal for validating updates before they are finalized.

Once you're ready, enter a commit title (e.g., Initial modeling: customer orders pipeline) and optionally include a description to give your team more context.

Click Stage files to commit, then Commit to save your changes locally.

3. Push to Your Git Repository

With the changes committed, you’ll now want to push them to your remote Git repository. Click the Push button at the bottom of the Git panel.

dex will push to the branch configured for the current environment. For example, if you're in the prod environment, it will push to the prod branch in your GitHub or GitLab repository.

Once pushed, your changes become available to the rest of your team and will be picked up by any Flows, automations, or scheduled runs configured on that branch.

Read more about Github/Gitlab integration in the Version Control with Git page

Step 10: Automate your pipeline with Flow

Now that you've built and tested your data models, it's time to automate the entire workflow. In dex, this is done using Flows—orchestration pipelines that can run on a schedule or be triggered manually.

Creating a Flow to Automate Your Workflow

Open the Flows section from the left-hand navigation menu.

2. Create a new flow

Click the New Flow button on the top right. Fill in the required fields:

Name: Give your flow a meaningful name (e.g., daily ecommerce pipeline)
Description (optional): Briefly describe what the flow will do

Then, configure the schedule:

Set the run time to Every day at 04:00 AM
Click Continue to proceed

3. Add a Connection node

The first step in your pipeline is data ingestion. Create a Connection node and select the connector you configured in Step 5 of your setup. This ensures your source data is always up-to-date before the transformations run.

4. Add a new Transformation node

Create a Transformation node by selecting:

Project: your current working project
Environment: the correct environment (e.g., production or dev)

In the Include field, type: +customer_orders.sql

Then press Enter.

This argument tells dex to execute the customer_orders.sql model and all of its upstream dependencies. The + prefix automatically includes every model required for this one to work—no need to manually list them.

5. Save and Run

Click Save to store your Flow configuration.

Now, run it manually by using the Run Now option. This helps confirm your flow works as expected before it runs on schedule.

6. Done!

Your ingestion and transformation flow is now live and will run automatically every day at 04:00 AM.

7. Monitor Runs

You can view run history and inspect execution results by clicking the Runs tab inside your flow. This includes:

Run status (e.g., succeeded or failed)
Start and end time
Task execution logs
Troubleshooting details for each model

Read more about Flows in the Flows and Automation page

Congratulations! You’ve Completed Your First Data Journey

You’ve just built a fully operational data pipeline using dex—from ingestion to transformation to orchestration.

All data you’ve generated and automated is now stored in your own cloud environment—fully queryable and ready to be consumed by any BI tool, notebook, or data science workflow.

This is a solid foundation that mirrors real-world data engineering best practices. But this is just the beginning.

dex is built to grow with your complexity—whether that’s more data sources, advanced transformations, or multiple teams collaborating on analytics. The rest of our documentation will help you expand your capabilities, customize workflows, and unlock new use cases.

Happy building!

PreviousQuickstart dex with GCP NextWhat is dex?

Last updated 1 month ago

Was this helpful?

Create a Personal GitHub Account

Create an Organization

Create an Empty Repository

Create a Git Access Token

Create Your Model Folders

Create Your First Model: customers.sql

Add a Second Model: orders.sql

Add a Third Model: order_payments.sql

Create a Cleaned Model: customer_orders.sql

Update Your Project Configuration

What This Does

1. Open the Git Tab

2. Stage Your Changes

3. Push to Your Git Repository

Creating a Flow to Automate Your Workflow

1. Navigate to the Flows menu

2. Create a new flow

3. Add a Connection node

4. Add a new Transformation node

5. Save and Run

6. Done!

7. Monitor Runs

Congratulations! You’ve Completed Your First Data Journey

Create Your First Model: `customers.sql`

Add a Second Model: `orders.sql`

Add a Third Model: `order_payments.sql`

Create a Cleaned Model: `customer_orders.sql`