Quickstart dex with AWS

Get started building data pipelines using AWS, S3 and Athena

In this guide, you'll learn how to set up dex with Amazon Web Services (AWS) using Athena and S3. We'll walk you through:

  • Creating an AWS account and S3 bucket

  • Assigning required IAM roles and policies

  • Connecting dex to Athena and S3

  • Setting up GitHub or GitLab for version control

  • Accessing sample data stored in S3

  • Writing your first SQL model

  • Adding tests and documentation

  • Automating your data pipelines

By the end, you’ll have a fully operational dex project running in your own AWS environment.

Step 1: Prerequisites

Before you begin, make sure you have:

  • AWS account and access

  • A dex account (sign up here)

  • A GitHub or GitLab account for code management

Step 2: Get ready to configure dex
  1. Log in to dex

  2. Change your temporary password, if asked

  3. Select Set Up with AWS

Step 3: Set Up your AWS Resources

To establish a secure and functional integration between dex and your AWS account, the following resources need to be created:

  • Two dedicated S3 buckets: one for storing data, the other for metadata

  • Two Athena workgroups: one for production, and another for development

  • One IAM user or role: to allow dex to write data to your cloud storage

Before running the CloudFormation template, confirm your AWS region. This determines where your buckets will be created. If your organization doesn’t have a preferred region, we recommend using us-east-1.

Access the CloudFormation Console

Step 1: Create the Stack
  1. Go to the AWS CloudFormation Console and click Create Stack.

  2. Under Prepare template, select Choose an existing template.

  3. For Template source, choose Amazon S3 URL.

  4. Enter the following S3 URL for the deX CloudFormation template:

https://dex-public-assets.s3.us-east-1.amazonaws.com/cloudformation/dex_roles.yaml

This will load the dex automation script and prepare it for configuration.

Step 2: Specify Stack Details
  • In Stack name, enter a name to identify this setup—for example, dex-setup.

Under Parameters:

  • CreateDexUser: Select True to create an IAM user in your AWS account. This user will be used by dex Automation to write data to your cloud. These credentials will never be stored by deX.

  • InstallationID: Enter the unique ID displayed in your deX workspace during the cloud connection step

Step 3: Configure Stack Options

This step allows for advanced configuration, but for most users, the defaults are sufficient. However, we recommend reviewing these two settings to ensure a clean rollback in case of failure.

Under Stack failure options:

  • Set Behavior on provisioning failure to Roll back all stack resources

  • Set Delete newly created resources during a rollback to Use deletion policy

Step 4: Review and Create

Review all the information entered in the previous steps. Once confirmed, click Create Stack to begin provisioning the resources.

If your stack creation fails, reach out to dex Support for assistance.

Step 5: Finalize the Setup in deX

Go to the Outputs tab in your new stack and locate the Access Key ID and Secret Access Key. Copy and paste these credentials into the corresponding fields in the dex platform to complete the connection.

Step 4: Set up your Github/Gitlab account

dex uses Git for version control, CI/CD, and collaboration. You can choose to set up with GitHub or GitLab. We'll guide you through the GitHub setup since most of our customers are familiar with it.

Create a Personal GitHub Account

Skip this if you already have a GitHub account.

  1. Go to GitHub

  2. Click Sign Up

  3. Verify your email

Create an Organization

  1. Go to Your Organizations in the profile dropdown

  2. Click New Organization

  3. Choose a plan (Free or Team) and click Create Organization

Create an Empty Repository

  1. In your new organization, go to the Repositories tab

  2. Click New Repository

  3. Fill in:

    1. Repository name (e.g. dex-analytics)

    2. Optional description

    3. Set to Private

  4. Do not add a README—repo must be empty

Important: The repository must be brand new and empty (no commits or README).

Create a Git Access Token

How to Create a Github Access Token
  1. Go to Settings > Developer Settings > Personal Access Tokens

  2. Click Generate new token (classic)

  3. Configure the token:

    • Name: dex_lakehouse_access_token

    • Expiration: Optional

    • Scopes: Select repo

  4. Click Generate Token

  5. Copy and save the token—input this token in the dex setup

If you prefer to limit access to specific repositories, use a fine-grained personal access tokens token instead of a classic token.

Step 5: Set up your first Connection

In this example, we’ll connect to a sample Postgres database.

  1. In the left-hand menu, go to Connection > New Source

  1. Select Postgres from the connector catalog

  2. Give your connection a name, like Demo Postgres Database Connection

  3. Click Next, then enter the following credentials:

    1. Host: dex-trial.db.aws.dexlabs.io

    2. Port: 5432

    3. User: trial_user

    4. Database: postgres

    5. Password: %7R^RT4N#h#WdjpU2#or@W5

    6. SSH Tunnel: Select Don’t Use SSH Tunnel

  4. Click Test and then Save

  5. Wait a few seconds while dex tests the connection

  6. On the next screen, toggle all datapoints on

  7. Click Run to execute a manual data replication

  8. Click on the Runs tab to check your replication status

Step 6: Build your first model

Now that we have raw data connected, we can start modeling it. In this example, we'll organize data into two layers: raw and cleaned. Most organizations build additional layers (like trusted, analytics, or mart) on top of these, but this will give us a solid foundation.

Create Your Model Folders

  1. In the left-hand menu, go to Develop

  2. Right-click the Models folder and create two subfolders:

    • 1.raw

    • 2.cleaned

Create Your First Model: customers.sql

  1. Right-click on the 1.raw folder and create a new file: customers.sql

  2. Paste the following code into the file (update the from clause with your copied source):

    select
        customer_state as customer_state,
        customer_unique_id as customer_unique_id,
        regexp_extract(customer_id,r'^.{0,3}') as customer_id,
        customer_city as customer_city,
        customer_zip_code_prefix as customer_zip_code_prefix
    from 
        <your_copy_as_source_here>
  1. If you encounter any errors related to the data source configuration, refer to this section in the documentation

  2. Click Save

  3. Click Preview and Run to see the results

Add a Second Model: orders.sql

  1. Right-click 1.raw again and add a new file: orders.sql

  2. Paste in the following query:

    
    SELECT
        order_id,
        customer_id,
        order_status,
        order_purchase_timestamp,
        order_approved_at,
        order_delivered_carrier_date,
        order_delivered_customer_date,
        order_estimated_delivery_date
    FROM 
        <your_copy_as_source_here>
  3. If you encounter any errors related to the data source configuration, refer to this section in the documentation

  4. Click Save

  5. Click Preview and Run to see the results

Add a Third Model: order_payments.sql

  1. Right-click 1.raw again and create: order_payments.sql

  2. Paste in the following code:

        select
            payment_type as payment_type,
            payment_value as payment_value,
            payment_installments as payment_installments,
            payment_sequential as payment_sequential,
            order_id as order_id
        from 
            <your_copy_as_source_here>
  3. If you encounter any errors related to the data source configuration, refer to this section in the documentation

  4. Click Save

  5. Click Preview and Run to validate the model

Create a Cleaned Model: customer_orders.sql

  1. Right-click on 2.cleaned and create a file called customer_orders.sql

  2. Add the following code to join data across the three models:

{{
  config(
    tags=['finance']
  )
}}

with 

orders_info as (
    select
    order_id as order_id,
    customer_id as customer_id,
    order_purchase_timestamp as order_date,
    order_status as order_status
    from 
    {{ ref ('orders') }}
),

payments_info as (
    select
        payment_type as payment_type,
        payment_value as payment_value,
        payment_installments as payment_installments,
        order_id as order_id
    from 
        {{ ref('order_payments') }}
),

customer_info as (
    select
        customer_id as customer_id,
        customer_city as customer_city
    from
        {{ ref('customers') }}
)

select
    o.order_id,
	o.customer_id,
    ci.customer_city,
    py.payment_value,
    py.payment_type,
    py.payment_installments,
    o.order_date,
    o.order_status
from
orders_info o
left join
payments_info py on py.order_id = o.order_id
left join    
customer_info ci on ci.customer_id = o.customer_id
  1. Click Save

  2. Click Preview and Run to see the final output

Read more about Models in the Develop with dex page

Step 7: Change the way your model is materializaed

One of the most powerful features of dex is the ability to control how models are materialized in your warehouse—without changing SQL. With a single configuration value, you can switch models from views to tables, and vice versa.

This gives you the flexibility to optimize performance and cost, while keeping your modeling layer clean and focused on business logic.

By default, all models are materialized as views. You can override this at the directory level, so every model inside that folder uses a different materialization strategy.

Update Your Project Configuration

  1. In your file explorer, open the dbt_project.yml file at the root of your project

  2. Update the project name (line 5) to: dex_lakehouse

  3. Define how your 1.raw and 2.cleaned models should be materialized as tables by adding this configuration under models: (line 28):

...

# Configuring models
# Full documentation: https://dex-labs-1.gitbook.io/wip-dex-docs/project-settings-and-defaults

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  dex_lakehouse:
    # Config indicated by + and applies to all files under models/example/
    example:
      +materialized: view
    1.raw:
      +materialized: table
      +schema: raw
    2.cleaned:
      +materialized: table
      +schema: cleaned
  1. Save the file

  2. If you want to override materialization for a specific model, you can do it inline at the top of the file using the config() block:

{{
  config(
    materialized='view'
  )
}}

Read more about materialization in the Materialization page

Step 8: Add tests to your models
  • Right-click the 2.cleaned folder in the Explorer panel and select New File.

  • Name the file: customer_orders.yml

  • Paste the following content into the file:

version: 2

models:
  - name: customer_orders
    description: "Joined dataset combining order, payment, and customer information."
    columns:
      - name: order_id
        description: "Unique identifier for each order"
        tests:
          - not_null
          - unique

What This Does

  • Model-level metadata: You are describing the customer_orders model.

  • Column-level tests:

    • not_null: Ensures every row has an order_id

    • unique: Ensures no duplicate order_id exists

Once you've saved the .yml file:

  • Run the model again.

  • The tests will automatically execute right after the model builds.

  • If a test fails, the flow status will change to Failed, and a notification will be sent to the Notification Center for review.

Read more about tests in the Tests page

Step 9: Commit your changes

After building your first set of models in dex—such as customers.sql, orders.sql, order_payments.sql, and customer_orders.sql—and configuring metadata in customer_orders.yaml and dbt_project.yml, it's time to commit those changes to version control.

dex provides an integrated Git workflow so you can track, manage, and push your changes directly from the platform.

1. Open the Git Tab

Navigate to the Git tab from the left-hand Develop menu. dex will list all the files you've created or modified since your last commit. In this case, you should see the following:

  • models/1.bronze/customers.sql

  • models/1.bronze/orders.sql

  • models/1.bronze/order_payments.sql

  • models/2.silver/customer_orders.sql

  • models/2.silver/customer_orders.yaml

  • dbt_project.yml

2. Stage Your Changes

Review the file list and click the checkbox next to each file you'd like to include in the commit. Click on any filename to open a diff view that highlights the changes made—ideal for validating updates before they are finalized.

Once you're ready, enter a commit title (e.g., Initial modeling: customer orders pipeline) and optionally include a description to give your team more context.

Click Stage files to commit, then Commit to save your changes locally.

3. Push to Your Git Repository

With the changes committed, you’ll now want to push them to your remote Git repository. Click the Push button at the bottom of the Git panel.

dex will push to the branch configured for the current environment. For example, if you're in the prod environment, it will push to the prod branch in your GitHub or GitLab repository.

Once pushed, your changes become available to the rest of your team and will be picked up by any Flows, automations, or scheduled runs configured on that branch.

Read more about Github/Gitlab integration in the Version Control with Git page

Step 10: Automate your pipeline with Flow

Now that you've built and tested your data models, it's time to automate the entire workflow. In dex, this is done using Flows—orchestration pipelines that can run on a schedule or be triggered manually.

Creating a Flow to Automate Your Workflow

1. Navigate to the Flows menu

Open the Flows section from the left-hand navigation menu.

2. Create a new flow

Click the New Flow button on the top right. Fill in the required fields:

  • Name: Give your flow a meaningful name (e.g., daily ecommerce pipeline)

  • Description (optional): Briefly describe what the flow will do

Then, configure the schedule:

  • Set the run time to Every day at 04:00 AM

  • Click Continue to proceed

3. Add a Connection node

The first step in your pipeline is data ingestion. Create a Connection node and select the connector you configured in Step 5 of your setup. This ensures your source data is always up-to-date before the transformations run.

4. Add a new Transformation node

Create a Transformation node by selecting:

  • Project: your current working project

  • Environment: the correct environment (e.g., production or dev)

In the Include field, type: +customer_orders.sql

Then press Enter.

This argument tells dex to execute the customer_orders.sql model and all of its upstream dependencies. The + prefix automatically includes every model required for this one to work—no need to manually list them.

5. Save and Run

Click Save to store your Flow configuration.

Now, run it manually by using the Run Now option. This helps confirm your flow works as expected before it runs on schedule.

6. Done!

Your ingestion and transformation flow is now live and will run automatically every day at 04:00 AM.

7. Monitor Runs

You can view run history and inspect execution results by clicking the Runs tab inside your flow. This includes:

  • Run status (e.g., succeeded or failed)

  • Start and end time

  • Task execution logs

  • Troubleshooting details for each model

Read more about Flows in the Flows and Automation page

Congratulations! You’ve Completed Your First Data Journey

You’ve just built a fully operational data pipeline using dex—from ingestion to transformation to orchestration.

All data you’ve generated and automated is now stored in your own cloud environment—fully queryable and ready to be consumed by any BI tool, notebook, or data science workflow.

This is a solid foundation that mirrors real-world data engineering best practices. But this is just the beginning.

dex is built to grow with your complexity—whether that’s more data sources, advanced transformations, or multiple teams collaborating on analytics. The rest of our documentation will help you expand your capabilities, customize workflows, and unlock new use cases.

Happy building!

Last updated

Was this helpful?