Configuring Environments
To create, edit, or delete Environments, go to the side menu → Organization → Settings.

Once an environment is created, you can configure the following settings:
Name and Description: Provide a clear and meaningful name and description to help differentiate between environments.
Branches
Branch Source: The Git branch to pull code from.
Branch Destination: The Git branch where changes will be committed.
Examples
Production Environment
Branch Source:
mainormasterBranch Destination:
mainormaster(same as source)
Development Environment
Branch Source:
mainormasterBranch Destination:
dev
Connection Settings Set up the environment’s cloud connection (e.g., AWS or GCP) by selecting the appropriate configuration for that provider.
AWS Environments Configurations
Database
Specify the database (Data catalog) to build models into
awsdatacatalog
Region
AWS region of your Athena instance
us-east-1, eu-west-1
S3 Staging Directory
S3 location to store Athena query results and metadata
s3://my_bucket/my_folder/...
Schema
Specify the schema (Athena database) to build models into (lowercase only)
production, development, test
Number of Boto3 Retries
Number of times to retry boto3 requests (e.g. deleting S3 files for materialized tables)
3
Number of Retries
Number of times to retry a failing query
3
S3 Data Directory
Prefix for storing tables, if different from the connection's S3 Staging Directory
s3://my_bucket/my_folder/...
S3 Data Naming Convention
How to generate table paths in s3_data_dir
schema_table: {s3_data_dir}/{schema3}...
S3 Temp Tables Prefix
Prefix for storing temporary tables, if different from the connection's s3_data_dir
Spark Work Group
Identifier of Athena Spark workgroup for running Python models
Number of Threads
Number of threads to use
4
Work Group
Identifier of Athena workgroup
GCP Environment Configurations
Project ID
The GCP project ID that contains your BigQuery datasets
my-project
Number of Threads
The number of threads to use for parallel execution
4
Dataset
The default BigQuery dataset to be used. The same as schema
my-dataset
Priority
The priority with which to execute BigQuery queries
batch
Job Execution Timeout (Seconds)
Maximum number of seconds to wait for a query to complete
300
Job Creation Timeout (Seconds)
Maximum number of seconds to wait when submitting a job
300
Number of Job Retries
The number of times to retry a failed job
3
Job Retries Deadline (Seconds)
Maximum time in seconds for a job and its retries before raising an error
300
Location
The geographical location of your BigQuery dataset
US
Maximum Bytes Billed
The max number of bytes that can be billed for a given BigQuery query. Queries will fail if they exceed this limit
Scopes
Scopes for authenticating the connection
Service Account to Impersonate
The Google service account to impersonate when making API requests
Dataproc Region
The Google Cloud region for PySpark workloads on Dataproc
us-east1, us-west1, ...
GCS Bucket Name
The URI for a Google Cloud Storage bucket to host Python code executed via Dataproc
Last updated
Was this helpful?

