Quickstart dex with GCP

Get started building data pipelines using GCP and BigQuery

In this guide, you'll learn how to set up dex with Google Cloud and BigQuery. We'll walk you through:

Creating a Google Cloud Project
Assigning required roles and credentials
Connecting dex to BigQuery
Setting up GitHub or GitLab for version control
Access sample data in a public dataset.
Writing your first SQL model
Adding tests and documentation
Automating your data pipelines

By the end, you’ll have a fully operational dex project running in your own cloud.

Step 1: Prerequisites

Before you begin, make sure you have:

A Google Cloud account
A dex account (sign up here)
A GitHub or GitLab account for code management

Step 2: Set Up your GCP Account

To complete the "Connect to Your Cloud Account" step in the deX platform, you'll need to configure a few resources in your GCP console. Once this is complete, you’ll receive a Service Account JSON key to use within dex.

Create a GCP Project for dex

Open the Google Cloud Console.
Click New Project.
Fill in the required fields:
- Name your project something like your-org-lakehouse
- Choose a Location with your SRE or infra team for optimal cost/performance
Click Create

Tip: In production, you may want separate projects for dev and prod. For this quickstart, one is enough.

Create Cloud Credentials

dex needs specific GCP permissions to operate. We’ve made this easy with a setup script.

For reference, these are the permissions dex needs (generated by the setup script). You can read the full list of permissions and what they do here.

storage.managedFolders.delete
storage.managedFolders.get
storage.managedFolders.list
storage.multipartUploads.abort
storage.multipartUploads.create
storage.multipartUploads.list
storage.multipartUploads.listParts
storage.objects.create
storage.objects.delete
storage.objects.get
storage.objects.list
storage.objects.restore
storage.objects.update
bigquery.datasets.create
bigquery.datasets.get
bigquery.datasets.getIamPolicy
bigquery.jobs.create
bigquery.models.getMetadata
bigquery.models.list
bigquery.routines.get
bigquery.routines.list
bigquery.tables.create
bigquery.tables.get
bigquery.tables.getData
bigquery.tables.getIamPolicy
bigquery.tables.list
bigquery.tables.update
bigquery.tables.updateData
dataplex.projects.search
resourcemanager.projects.get

Set Up GCP Credentials with Cloud Shell

To configure access between dex and BigQuery, follow these steps using Google Cloud Shell:

1. Open the GCP Console

2. In the top bar, click on the project selector and choose the project you created or reserved for dex.

3. Launch Cloud Shell Editor - Use the search bar at the top of the page to search for Cloud Shell Editor and click to open it.

4. Inside the Cloud Shell Editor, click the "Open Terminal" button near the top of the screen.

5. Paste the following command into the terminal to download the setup script:

wget https://raw.githubusercontent.com/dexlabsio/terraform-modules/refs/heads/main/gcp/cloud-shell/bigquery-service-account-setup.py

6. Set your GCP project ID in the shell environment

export GOOGLE_CLOUD_PROJECT=$(gcloud config get-value project)

7. Execute the setup script

python3 bigquery-service-account-setup.py

8. Download your Service Account Key

After running the script, you’ll receive a Service Account JSON key on the screen

Download and store this key securely — you’ll need it when connecting dex to BigQuery.

Step 3: Set up your Github/Gitlab account

dex uses Git for version control, CI/CD, and collaboration. You can choose to set up with GitHub or GitLab. We'll guide you through the GitHub setup since most of our customers are familiar with it.

Create a Personal GitHub Account

Skip this if you already have a GitHub account.

Go to GitHub
Click Sign Up
Verify your email

Create an Organization

Go to Your Organizations in the profile dropdown
Click New Organization
Choose a plan (Free or Team) and click Create Organization

Create an Empty Repository

In your new organization, go to the Repositories tab
Click New Repository
Fill in:
1. Repository name (e.g. dex-analytics)
2. Optional description
3. Set to Private
Do not add a README—repo must be empty

Important: The repository must be brand new and empty (no commits or README).

Create a Git Access Token

Go to Settings > Developer Settings > Personal Access Tokens
Click Generate new token (classic)
Configure the token:
- Name: dex_lakehouse_access_token
- Expiration: Optional
- Scopes: Select repo
Click Generate Token
Copy and save the token—input this token in the dex setup

If you prefer to limit access to specific repositories, use a fine-grained personal access tokens token instead of a classic token.

Step 4: Connect dex to GCP and Git provider

Log in to dex
Change your temporary password, if asked
Select Set Up with GCP

Step 5: Set up your first Connection

In this example, we’ll connect to a sample Postgres database.

In the left-hand menu, go to Connection > New Source

Select Postgres from the connector catalog
Give your connection a name, like Demo Postgres Database Connection
Click Next, then enter the following credentials:
1. Host: dex-trial.db.aws.dexlabs.io
2. Port: 5432
3. User: trial_user
4. Database: postgres
5. Password: %7R^RT4N#h#WdjpU2#or@W5
6. SSH Tunnel: Select Don’t Use SSH Tunnel
Click Test and then Save
Wait a few seconds while dex tests the connection
On the next screen, toggle all datapoints on
Click Run to execute a manual data replication
Click on the Runs tab to check your replication status

Step 6: Build your first model

Now that we have raw data connected, we can start modeling it. In this example, we'll organize data into two layers: raw and cleaned. Most organizations build additional layers (like trusted, analytics, or mart) on top of these, but this will give us a solid foundation.

Create Your Model Folders

In the left-hand menu, go to Develop
Right-click the Models folder and create two subfolders:
- 1.raw
- 2.cleaned

Create Your First Model: `customers.sql`

Right-click on the 1.raw folder and create a new file: customers.sql
Paste the following code into the file (update the from clause with your copied source):

    select
        customer_state as customer_state,
        customer_unique_id as customer_unique_id,
        regexp_extract(customer_id,r'^.{0,3}') as customer_id,
        customer_city as customer_city,
        customer_zip_code_prefix as customer_zip_code_prefix
    from 
        <your_copy_as_source_here>

If you encounter any errors related to the data source configuration, refer to this section in the documentation
Click Save
Click Preview and Run to see the results

Add a Second Model: `orders.sql`

Right-click 1.raw again and add a new file: orders.sql

Paste in the following query:


SELECT
    order_id,
    customer_id,
    order_status,
    order_purchase_timestamp,
    order_approved_at,
    order_delivered_carrier_date,
    order_delivered_customer_date,
    order_estimated_delivery_date
FROM 
    <your_copy_as_source_here>

If you encounter any errors related to the data source configuration, refer to this section in the documentation
Click Save
Click Preview and Run to see the results

Add a Third Model: `order_payments.sql`

Right-click 1.raw again and create: order_payments.sql

Paste in the following code:

    select
        payment_type as payment_type,
        payment_value as payment_value,
        payment_installments as payment_installments,
        payment_sequential as payment_sequential,
        order_id as order_id
    from 
        <your_copy_as_source_here>

If you encounter any errors related to the data source configuration, refer to this section in the documentation
Click Save
Click Preview and Run to validate the model

Create a Cleaned Model: `customer_orders.sql`

Right-click on 2.cleaned and create a file called customer_orders.sql
Add the following code to join data across the three models:

{{
  config(
    tags=['finance']
  )
}}

with 

orders_info as (
    select
    order_id as order_id,
    customer_id as customer_id,
    order_purchase_timestamp as order_date,
    order_status as order_status
    from 
    {{ ref ('orders') }}
),

payments_info as (
    select
        payment_type as payment_type,
        payment_value as payment_value,
        payment_installments as payment_installments,
        order_id as order_id
    from 
        {{ ref('order_payments') }}
),

customer_info as (
    select
        customer_id as customer_id,
        customer_city as customer_city
    from
        {{ ref('customers') }}
)

select
    o.order_id,
	o.customer_id,
    ci.customer_city,
    py.payment_value,
    py.payment_type,
    py.payment_installments,
    o.order_date,
    o.order_status
from
orders_info o
left join
payments_info py on py.order_id = o.order_id
left join    
customer_info ci on ci.customer_id = o.customer_id

Click Save
Click Preview and Run to see the final output

Read more about Models in the Develop with dex page

Step 7: Change the way your model is materializaed

One of the most powerful features of dex is the ability to control how models are materialized in your warehouse—without changing SQL. With a single configuration value, you can switch models from views to tables, and vice versa.

This gives you the flexibility to optimize performance and cost, while keeping your modeling layer clean and focused on business logic.

By default, all models are materialized as views. You can override this at the directory level, so every model inside that folder uses a different materialization strategy.

Update Your Project Configuration

In your file explorer, open the dbt_project.yml file at the root of your project
Update the project name (line 5) to: dex_lakehouse
Define how your 1.raw and 2.cleaned models should be materialized as tables by adding this configuration under models: (line 28):

...

# Configuring models
# Full documentation: https://dex-labs-1.gitbook.io/wip-dex-docs/project-settings-and-defaults

# In this example config, we tell dbt to build all models in the example/
# directory as views. These settings can be overridden in the individual model
# files using the `{{ config(...) }}` macro.
models:
  dex_lakehouse:
    # Config indicated by + and applies to all files under models/example/
    example:
      +materialized: view
    1.raw:
      +materialized: table
      +schema: raw
    2.cleaned:
      +materialized: table
      +schema: cleaned

Save the file
If you want to override materialization for a specific model, you can do it inline at the top of the file using the config() block:

{{
  config(
    materialized='view'
  )
}}

Read more about materialization in the Materialization page

Step 8: Add tests to your models

Right-click the 2.cleaned folder in the Explorer panel and select New File.

Name the file: customer_orders.yml
Paste the following content into the file:

version: 2

models:
  - name: customer_orders
    description: "Joined dataset combining order, payment, and customer information."
    columns:
      - name: order_id
        description: "Unique identifier for each order"
        tests:
          - not_null
          - unique

What This Does

Model-level metadata: You are describing the customer_orders model.
Column-level tests:
- not_null: Ensures every row has an order_id
- unique: Ensures no duplicate order_id exists

Once you've saved the .yml file:

Run the model again.
The tests will automatically execute right after the model builds.
If a test fails, the flow status will change to Failed, and a notification will be sent to the Notification Center for review.

1. Open the Git Tab

Navigate to the Git tab from the left-hand Develop menu. dex will list all the files you've created or modified since your last commit. In this case, you should see the following:

models/1.bronze/customers.sql
models/1.bronze/orders.sql
models/1.bronze/order_payments.sql
models/2.silver/customer_orders.sql
models/2.silver/customer_orders.yaml
dbt_project.yml

2. Stage Your Changes

Review the file list and click the checkbox next to each file you'd like to include in the commit. Click on any filename to open a diff view that highlights the changes made—ideal for validating updates before they are finalized.

Once you're ready, enter a commit title (e.g., Initial modeling: customer orders pipeline) and optionally include a description to give your team more context.

Click Stage files to commit, then Commit to save your changes locally.

3. Push to Your Git Repository

With the changes committed, you’ll now want to push them to your remote Git repository. Click the Push button at the bottom of the Git panel.

dex will push to the branch configured for the current environment. For example, if you're in the prod environment, it will push to the prod branch in your GitHub or GitLab repository.

Once pushed, your changes become available to the rest of your team and will be picked up by any Flows, automations, or scheduled runs configured on that branch.

Read more about Github/Gitlab integration in the Version Control with Git page

Step 10: Automate your pipeline with Flow

Now that you've built and tested your data models, it's time to automate the entire workflow. In dex, this is done using Flows—orchestration pipelines that can run on a schedule or be triggered manually.

Creating a Flow to Automate Your Workflow

Open the Flows section from the left-hand navigation menu.

2. Create a new flow

Click the New Flow button on the top right. Fill in the required fields:

Name: Give your flow a meaningful name (e.g., daily ecommerce pipeline)
Description (optional): Briefly describe what the flow will do

Then, configure the schedule:

Set the run time to Every day at 04:00 AM
Click Continue to proceed

3. Add a Connection node

The first step in your pipeline is data ingestion. Create a Connection node and select the connector you configured in Step 5 of your setup. This ensures your source data is always up-to-date before the transformations run.

4. Add a new Transformation node

Create a Transformation node by selecting:

Project: your current working project
Environment: the correct environment (e.g., production or dev)

In the Include field, type: +customer_orders.sql

Then press Enter.

This argument tells dex to execute the customer_orders.sql model and all of its upstream dependencies. The + prefix automatically includes every model required for this one to work—no need to manually list them.

5. Save and Run

Click Save to store your Flow configuration.

Now, run it manually by using the Run Now option. This helps confirm your flow works as expected before it runs on schedule.

6. Done!

Your ingestion and transformation flow is now live and will run automatically every day at 04:00 AM.

7. Monitor Runs

You can view run history and inspect execution results by clicking the Runs tab inside your flow. This includes:

Run status (e.g., succeeded or failed)
Start and end time
Task execution logs
Troubleshooting details for each model

Read more about Flows in the Flows and Automation page

Congratulations! You’ve Completed Your First Data Journey

You’ve just built a fully operational data pipeline using dex—from ingestion to transformation to orchestration.

All data you’ve generated and automated is now stored in your own cloud environment—fully queryable and ready to be consumed by any BI tool, notebook, or data science workflow.

This is a solid foundation that mirrors real-world data engineering best practices. But this is just the beginning.

dex is built to grow with your complexity—whether that’s more data sources, advanced transformations, or multiple teams collaborating on analytics. The rest of our documentation will help you expand your capabilities, customize workflows, and unlock new use cases.

Happy building!

PreviousGet Started with dex NextQuickstart dex with AWS

Last updated 1 month ago

Was this helpful?