Options - Artie

/transfer -c /path/to/config.yaml

Note: Keys here are formatted in dot notation for readability purposes, please ensure that the proper nesting is done when writing this into your configuration file. To see sample configuration files, visit the examples page.

Key	Optional	Description
`mode`	Y	Defaults to `replication`. Supported values are currently: `replication` `history`
`outputSource`	N	This is the destination. Supported values are currently: `snowflake` `bigquery` `s3` `databricks`
`queue`	Y	Defaults to `kafka`. Other valid options are `kafka` and `pubsub`. Please check the respective sections below on what else is required.
`reporting.sentry.dsn`	Y	DSN for Sentry alerts. If blank, will just go to stdout.
`flushIntervalSeconds`	Y	Defaults to `10`. Valid range is between `5 seconds` to `6 hours`.
`bufferRows`	Y	Defaults to `15,000`
`flushSizeKb`	Y	Defaults to `25mb`

Source

Kafka

Key	Optional	Description
`kafka.bootstrapServer`	N	Pass in the Kafka bootstrap server. For best practices, pass in a comma separated list of bootstrap servers to maintain high availability. This is the same spec as Kafka.
`kafka.groupID`	N	This is the name of the Kafka consumer group. You can set to whatever you’d like. Just remember that the offsets are associated to a particular consumer group.
`kafka.username`	Y	If you’d like to use SASL/SCRAM auth, you can pass the username and password.
`kafka.password`	Y	If you’d like to use SASL/SCRAM auth, you can pass the username and password.
`kafka.enableAWSMSKIAM`	Y	Enable this if you would like to use IAM authentication to communicate with Amazon MSK. If enabled, please ensure `AWS_REGION`, `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` are set.
`kafka.disableTLS`	Y	Enable this to disable TLS.
`kafka.topicConfigs`	N	Follow the same convention as `*.topicConfigs` below.

Example

Google Pub/Sub

If you don’t have access to Kafka, you can use Google Pub/Sub. However, we would recommend you to use Kafka for the best possible experience.

Key	Optional	Description
`pubsub.projectID`	N	This is your GCP Project ID, click here to see how you can find it.
`pubsub.pathToCredentials`	N	This is the path to the credentials for the service account to use. You can re-use the same credentials as BigQuery, or you can use a different service account to support use cases of cross-account transfers.
`pubsub.topicConfigs`	N	Follow the same convention as `*.topicConfigs` below.

Example

Topic Configs

topicConfigs are used at the table level and store configurations like:

Destination’s database, schema and table name.
What does the data format look like?
Whether it should do row based soft deletion or not.
Whether it should drop deleted columns or not.

Key	Optional	Description
`*.topicConfigs[0].db`	N	Name of the database in destination.
`*.topicConfigs[0].tableName`	Y	Name of the table in destination. If not provided, will use table name from event If provided, tableName acts an override
`*.topicConfigs[0].schema`	N	Name of the schema in Snowflake. Not needed for BigQuery.
`*.topicConfigs[0].topic`	N	Name of the Kafka topic.
`*.topicConfigs[0].cdcFormat`	N	Name of the CDC connector (thus format) we should be expecting to parse against. The supported values are: `debezium.postgres` `debezium.mongodb` `debezium.mysql`
`*.topicConfigs[0].cdcKeyFormat`	N	Format for what Kafka Connect will the key to be. This is called `key.converter` in the Kafka Connect properties file. The supported values are: `org.apache.kafka.connect.storage.StringConverter` `org.apache.kafka.connect.json.JsonConverter` If not provided, the default value will be `org.apache.kafka.connect.storage.StringConverter`.
`*.topicConfigs[0].dropDeletedColumns`	Y	Defaults to `false`. When set to `true`, Transfer will drop columns in the destination when Transfer detects that the source has dropped these columns. This column should be turned on if your organization follows standard practice around database migrations.
`*.topicConfigs[0].softDelete`	Y	Defaults to `false`. When set to `true`, Transfer will add an additional column called `__artie_delete` and will set the column to true instead of issuing a hard deletion.
`*.topicConfigs[0].skippedOperations`	Y	Comma-separated string for Transfer to specified operations. Valid values are: c (create) r (replication or backfill) u (update) d (delete) Can be specified like: `c,d` to skip create and deletes.
`*.topicConfigs[0].includeArtieUpdatedAt`	Y	Defaults to `false`. When set to `true`, Transfer will emit an additional timestamp column named `__artie_updated_at` which signifies when this row was processed.
`*.topicConfig[0].includeDatabaseUpdatedAt`	Y	Defaults to `false`. When set to `true`, Transfer will emit an additional timestamp column called `__artie_db_updated_at` which signifies the database time of when the row was processed.
`*.topicConfigs[0].bigQueryPartitionSettings`	Y	Enable this to turn on BigQuery table partitioning.

BigQuery partition settings

bigQueryPartitionSettings:
  partitionType: time
  partitionField: ts
  partitionBy: daily

Key	Optional	Description
`bigQueryPartitionSettings.partitionType`	N	Type of partitioning. We currently support only time-based partitioning. The valid values right now are just `time`.
`bigQueryPartitionSettings.partitionField`	N	Which field or column is being partitioned on.
`bigQueryPartitionSettings.partitionBy`	N	This is used for time partitioning, what is the time granularity? Valid values right now are just `daily`

Destination

BigQuery

Key	Optional	Description
`bigquery.pathToCredentials`	N	Path to the credentials file for Google. You can also directly inject `GOOGLE_APPLICATION_CREDENTIALS` ENV VAR, else Transfer will set it for you based on this value provided.
`bigquery.projectID`	N	Google Cloud Project ID
`bigquery.location`	Y	Location of the BigQuery dataset. Defaults to `us`.
`bigquery.defaultDataset`	N	The default dataset used. This just allows us to connect to BigQuery using data source notation (DSN).

Databricks

Key	Optional	Description
`databricks.host`	N	Host URL e.g. `https://test-cluster.azuredatabricks.net`
`databricks.httpPath`	N	HTTP path of the SQL warehouse
`databricks.port`	Y	HTTP port of the SQL warehouse (defaults to 443)
`databricks.catalog`	N	Unity catalog name
`personalAccessToken`	N	Personal access token for Databricks
`volume`	N	Volume name for Databricks. Volume must exist under the database and schema that you are replicating into.

Microsoft SQL Server

Key	Optional	Description
`mssql.host`	N	Database host e.g. `test-cluster.us-east-1.redshift.amazonaws.com`
`mssql.port`	N	-
`mssql.database`	N	Name of the database
`mssql.username`	N	-
`mssql.password`	N	-

S3

Key	Type	Optional	Description
`s3.bucket`	String	N	S3 bucket name. Example: `artie-transfer`.
`s3.folderName`	String	Y	Optional folder name within the bucket. If this is specified, Artie Transfer will save the files under `s3://artie-transfer/folderName/...`
`s3.awsAccessKeyID`	String	N	The `AWS_ACCESS_KEY_ID` for the service account.
`s3.awsSecretAccessKey`	String	N	The `AWS_SECRET_ACCESS_KEY` for the service account.

Example

Snowflake

Key	Optional	Description
`snowflake.account`	N	Account Identifier
`snowflake.username`	N	Snowflake username
`snowflake.password`	N	Snowflake password
`snowflake.warehouse`	N	Virtual warehouse name
`snowflake.region`	N	Snowflake region

Redshift

Key	Optional	Description
`redshift.host`	N	Host URL e.g. `test-cluster.us-east-1.redshift.amazonaws.com`
`redshift.port`	N	-
`redshift.database`	N	Namespace / Database in Redshift.
`redshift.username`	N	-
`redshift.password`	N	-
`redshift.bucket`	N	Bucket for where staging files will be stored.
`redshift.optionalS3Prefix`	Y	The prefix for S3, say bucket is `foo` and prefix is `bar`. It becomes: s3://foo/bar/file.txt
`redshift.credentialsClause`	N	Redshift credentials clause to store staging files into S3.

Telemetry

Overview of Telemetry can be found here.

Key	Type	Optional	Description
`telemetry.metrics`	Object	Y	Parent object. See below.
`telemetry.metrics.provider`	String	Y	Provider to export metrics to. Transfer currently only supports: datadog.
`telemetry.metrics.settings`	Object	Y	Additional settings block, see below
`telemetry.metrics.settings.tags`	Array	Y	Tags that will appear for every metric like: env:production, company:foo
`telemetry.metrics.settings.addr`	String	Y	Address for where the statsD agent is running. Defaults to 127.0.0.1:8125 if none is provided.
`telemetry.metrics.settings.sampling`	Number	Y	Percentage of data to send. Provide a number between 0 and 1. Defaults to 1 if none is provided. Refer to this for additional information.

Running Artie

​Source

​Kafka

​Google Pub/Sub

​Topic Configs

​Destination

​BigQuery

​Databricks

​Microsoft SQL Server

​S3

​Snowflake

​Redshift

​Telemetry

Source

Kafka

Google Pub/Sub

Topic Configs

Destination

BigQuery

Databricks

Microsoft SQL Server

S3

Snowflake

Redshift

Telemetry