Configuring Environments

To create, edit, or delete Environments, go to the side menu → Organization → Settings.

Environments Menu

Once an environment is created, you can configure the following settings:

  1. Name and Description: Provide a clear and meaningful name and description to help differentiate between environments.

  2. Branches

    • Branch Source: The Git branch to pull code from.

    • Branch Destination: The Git branch where changes will be committed.

Examples

  • Production Environment

    • Branch Source: main or master

    • Branch Destination: main or master (same as source)

  • Development Environment

    • Branch Source: main or master

    • Branch Destination: dev

  1. Connection Settings Set up the environment’s cloud connection (e.g., AWS or GCP) by selecting the appropriate configuration for that provider.

AWS Environments Configurations
Field
Description
Example

Database

Specify the database (Data catalog) to build models into

awsdatacatalog

Region

AWS region of your Athena instance

us-east-1, eu-west-1

S3 Staging Directory

S3 location to store Athena query results and metadata

s3://my_bucket/my_folder/...

Schema

Specify the schema (Athena database) to build models into (lowercase only)

production, development, test

Number of Boto3 Retries

Number of times to retry boto3 requests (e.g. deleting S3 files for materialized tables)

3

Number of Retries

Number of times to retry a failing query

3

S3 Data Directory

Prefix for storing tables, if different from the connection's S3 Staging Directory

s3://my_bucket/my_folder/...

S3 Data Naming Convention

How to generate table paths in s3_data_dir

schema_table: {s3_data_dir}/{schema3}...

S3 Temp Tables Prefix

Prefix for storing temporary tables, if different from the connection's s3_data_dir

Spark Work Group

Identifier of Athena Spark workgroup for running Python models

Number of Threads

Number of threads to use

4

Work Group

Identifier of Athena workgroup

GCP Environment Configurations
Field
Description
Example

Project ID

The GCP project ID that contains your BigQuery datasets

my-project

Number of Threads

The number of threads to use for parallel execution

4

Dataset

The default BigQuery dataset to be used. The same as schema

my-dataset

Priority

The priority with which to execute BigQuery queries

batch

Job Execution Timeout (Seconds)

Maximum number of seconds to wait for a query to complete

300

Job Creation Timeout (Seconds)

Maximum number of seconds to wait when submitting a job

300

Number of Job Retries

The number of times to retry a failed job

3

Job Retries Deadline (Seconds)

Maximum time in seconds for a job and its retries before raising an error

300

Location

The geographical location of your BigQuery dataset

US

Maximum Bytes Billed

The max number of bytes that can be billed for a given BigQuery query. Queries will fail if they exceed this limit

Scopes

Scopes for authenticating the connection

Service Account to Impersonate

The Google service account to impersonate when making API requests

Dataproc Region

The Google Cloud region for PySpark workloads on Dataproc

us-east1, us-west1, ...

GCS Bucket Name

The URI for a Google Cloud Storage bucket to host Python code executed via Dataproc

Last updated

Was this helpful?