/transfer -c /path/to/config.yaml

Note: Keys here are formatted in dot notation for readability purposes, please ensure that the proper nesting is done when writing this into your configuration file. To see sample configuration files, visit the examples page.

KeyOptionalDescription
modeYDefaults to replication. Supported values are currently:
  • replication
  • history
outputSourceNThis is the destination.
Supported values are currently:
  • snowflake
  • bigquery
  • s3
  • databricks
queueYDefaults to kafka.
Other valid options are kafka and pubsub.

Please check the respective sections below on what else is required.
reporting.sentry.dsnYDSN for Sentry alerts. If blank, will just go to stdout.
flushIntervalSecondsYDefaults to 10.

Valid range is between 5 seconds to 6 hours.
bufferRowsYDefaults to 15,000
flushSizeKbYDefaults to 25mb

When the in-memory database is greater than this value, it will trigger a flush cycle.

Kafka

kafka:
  bootstrapServer: localhost:9092,localhost:9093
  groupID: transfer
  username: artie
  password: transfer
  enableAWSMSKIAM: false
KeyOptionalDescription
kafka.bootstrapServerNPass in the Kafka bootstrap server. For best practices, pass in a comma separated list of bootstrap servers to maintain high availability. This is the same spec as Kafka.
kafka.groupIDNThis is the name of the Kafka consumer group. You can set to whatever you’d like. Just remember that the offsets are associated to a particular consumer group.
kafka.usernameYIf you’d like to use SASL/SCRAM auth, you can pass the username and password.
kafka.passwordYIf you’d like to use SASL/SCRAM auth, you can pass the username and password.
kafka.enableAWSMSKIAMYTurn this on if you would like to use IAM authentication to communicate with Amazon MSK. If you enabel this, make sure to pass in AWS_REGION, AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.
kafka.topicConfigsNFollow the same convention as pubsub.topicConfigs below.

Google Pub/Sub

If you don’t have access to Kafka, you can use Google Pub/Sub. For the best experience, we would recommend you to use Kafka.

pubsub:
  projectID: 123
  pathToCredentials: /path/to/pubsub.json
  topicConfigs:
    - {}
KeyOptionalDescription
pubsub.projectIDNThis is your GCP Project ID, click here to see how you can find it.
pubsub.pathToCredentialsNThis is the path to the credentials for the service account to use. You can re-use the same credentials as BigQuery, or you can use a different service account to support use cases of cross-account transfers.
pubsub.topicConfigsNFollow the same convention as kafka.topicConfigs above.

Topic Configs

topicConfigs are used at the table level and store configurations like:

  • Destination’s database, schema and table name.
  • What does the data format look like?
  • Whether it should do row based soft deletion or not.
  • Whether it should drop deleted columns or not.
kafka:
  topicConfigs:
    - {}
    - {}
# OR as
pubsub:
  topicConfigs:
    - {}
    - {}
KeyOptionalDescription
*.topicConfigs[0].dbNName of the database in destination.
*.topicConfigs[0].tableNameYName of the table in destination.
  • If not provided, will use table name from event
  • If provided, tableName acts an override
*.topicConfigs[0].schemaNName of the schema in Snowflake.

Not needed for BigQuery.
*.topicConfigs[0].topicNName of the Kafka topic.
*.topicConfigs[0].cdcFormatNName of the CDC connector (thus format) we should be expecting to parse against.

Currently, the supported values are:
  1. debezium.postgres
  2. debezium.mongodb
  3. debezium.mysql
*.topicConfigs[0].cdcKeyFormatNFormat for what Kafka Connect will the key to be. This is called key.converter in the Kafka Connect properties file.
The supported values are: org.apache.kafka.connect.storage.StringConverter, org.apache.kafka.connect.json.JsonConverter
If not provided, the default value will be org.apache.kafka.connect.storage.StringConverter.
*.topicConfigs[0].dropDeletedColumnsYDefaults to false.

When set to true, Transfer will drop columns in the destination when Transfer detects that the source has dropped these columns. This column should be turned on if your organization follows standard practice around database migrations.

This is available starting transfer:1.4.4.
*.topicConfigs[0].softDeleteYDefaults to false.

When set to true, Transfer will add an additional column called __artie_delete and will set the column to true instead of issuing a hard deletion.

This is available starting transfer:1.4.4.
*.topicConfigs[0].skippedOperationsYComma-separated string for Transfer to specified operations.

Valid values are:
  • c (create)
  • r (replication or backfill)
  • u (update)
  • d (delete)
Can be specified like: c,d to skip create and deletes.

This is available starting transfer:2.2.3.
*.topicConfigs[0].skipDeleteYThis is getting deprecated in the next Transfer version. Use skippedOperations instead.
Defaults to false.

When set to true, Transfer will skip the delete events.

This is available starting transfer:2.0.48.
*.topicConfigs[0].includeArtieUpdatedAtYDefaults to false.

When set to true, Transfer will emit an additional timestamp column named __artie_updated_at which signifies when this row was processed.

This is available starting transfer:2.0.17.
*.topicConfig[0].includeDatabaseUpdatedAtYDefaults to false.

When set to true, Transfer will emit an additional timestamp column called __artie_db_updated_at which signifies the database time of when the row was processed.

This is available starting transfer:2.2.2+.
*.topicConfigs[0].bigQueryPartitionSettingsYEnable this to turn on BigQuery table partitioning.

This is available starting transfer:2.0.24.

BigQuery Partition Settings

bigQueryPartitionSettings:
  partitionType: time
  partitionField: ts
  partitionBy: daily
KeyOptionalDescription
bigQueryPartitionSettings.partitionTypeNType of partitioning. We currently support only time-based partitioning. The valid values right now are just time.
bigQueryPartitionSettings.partitionFieldNWhich field or column is being partitioned on.
bigQueryPartitionSettings.partitionByNThis is used for time partitioning, what is the time granularity? Valid values right now are just daily

BigQuery

KeyOptionalDescription
bigquery.pathToCredentialsNPath to the credentials file for Google.

You can also directly inject GOOGLE_APPLICATION_CREDENTIALS ENV VAR, else Transfer will set it for you based on this value provided.
bigquery.projectIDNGoogle Cloud Project ID
bigquery.locationYLocation of the BigQuery dataset.

Defaults to us.
bigquery.defaultDatasetNThe default dataset used.

This just allows us to connect to BigQuery using data source notation (DSN).

Snowflake

Please see our Snowflake guide on how you can grab these values.

KeyOptionalDescription
snowflake.accountNAccount Identifier
snowflake.usernameNSnowflake username
snowflake.passwordNSnowflake password
snowflake.warehouseNVirtual warehouse name
snowflake.regionNSnowflake region

Redshift

KeyOptionalDescription
redshift.hostNHost URL e.g. test-cluster.us-east-1.redshift.amazonaws.com
redshift.portN-
redshift.databaseNNamespace / Database in Redshift.
redshift.usernameN-
redshift.passwordN-
redshift.bucketNBucket for where staging files will be stored.
Click here to see how to set up a S3 bucket and have it automatically purged based on expiration.
redshift.optionalS3PrefixYThe prefix for S3, say bucket is foo and prefix is bar. It becomes: s3://foo/bar/file.txt
redshift.credentialsClauseNRedshift credentials clause to store staging files into S3.

S3

s3:
  bucket: artie-transfer
  folderName: foo # Files will be saved under s3://artie-transfer/foo/...
  awsAccessKeyID: AWS_ACCESS_KEY_ID
  awsSecretAccessKey: AWS_SECRET_ACCESS_KEY
KeyTypeOptionalDescription
s3.bucketStringNS3 bucket name. Example: artie-transfer.
s3.folderNameStringYOptional folder name within the bucket. If this is specified, Artie Transfer will save the files under s3://artie-transfer/folderName/...
s3.awsAccessKeyIDStringNThe AWS_ACCESS_KEY_ID for the service account.
s3.awsSecretAccessKeyStringNThe AWS_SECRET_ACCESS_KEY for the service account.

Telemetry

Overview of Telemetry can be found here.

KeyTypeOptionalDescription
telemetry.metricsObjectYParent object. See below.
telemetry.metrics.providerStringYProvider to export metrics to. Transfer currently only supports: datadog.
telemetry.metrics.settingsObjectYAdditional settings block, see below
telemetry.metrics.settings.tagsArrayYTags that will appear for every metric like: env:production, company:foo
telemetry.metrics.settings.addrStringYAddress for where the statsD agent is running. Defaults to 127.0.0.1:8125 if none is provided.
telemetry.metrics.settings.samplingNumberYPercentage of data to send. Provide a number between 0 and 1. Defaults to 1 if none is provided. Refer to this for additional information.