Today

Artie Transfer’s currently only supports metrics and integrates with Datadog. We are committed to being vendor neutral, but not at the cost of stability and reliability. As such, we will be using OpenTelemetry when the library is stable.

We also plan to support application tracing such that we can directly plug into your APM provider.

Metrics

You can specify additional tags in the configuration file and have it apply to all the metrics that Transfer emits. Click here for more details.

NameDescriptionUnitTags
*.row.lagDifference between Kafka’s high watermark (which is at the partition level) and the current message offset.-groupid, topic, partition, table, mode
*.ingestion.lag.95percentilep95 of time lag from Kafka message was published and received.

Since Transfer 1.4.6
msgroupid, topic, partition, table, mode
*.ingestion.lag.avgAvg of time lag from Kafka message was published and received.

Since Transfer 1.4.6

If self-hosting Transfer, this is a good metric to monitor.
msgroupid, topic, partition, table, mode
*.ingestion.lag.maxmax lag from Kafka message was published and received.

Since Transfer 1.4.6
msgroupid, topic, partition, table, mode
*.process.message.countHow many rows has Transfer processed.Countdatabase, schema, table, groupid, op, skipped, mode
*.process.message.95percentilep95 of how long each row process takes.msdatabase, schema, table, groupid, op, skipped, mode
*.process.message.avgAverage of how long each row process takes.msdatabase, schema, table, groupid, op, skipped, mode
*.process.message.maxMax of how long each row process takes.msdatabase, schema, table, groupid, op, skipped, mode
*.process.message.medianMedian of how long each row process takes.msdatabase, schema, table, groupid, skipped, mode
*.flush.countHow many flush operations have been performed.Countdatabase, schema, table, what, reason
*.flush.95percentilep95 of how long each flush process takes.msdatabase, schema, table, what, reason
*.flush.avgAvg of how long each flush process takes.msdatabase, schema, table, what, reason
*.flush.maxMax of how long each flush process takes.msdatabase, schema, table, what, reason
*.flush.medianMedian of how long each flush process takes.msdatabase, schema, table, what, reason

The what tag explained

The what tag aims to provide a high level of visibility into whether an attempt has succeeded or not. And if it did not succeed, it will provide additional visibility into which particular operation failed (vs just providing a generic error state).

Transfer will provide what:success if the attempt failed and different reasoning depending on the error state. This way, our monitors and response to failures can be more actionable and we can jump straight to the offending code block.

Visualization of the what tag