Skip to main content

How it works

This is how Artie handles for each table:
1

Backfill

  • Scans full table in batches, writes directly to destination
  • Can use read replica to minimize primary DB load
2

CDC

  • Reads database logs immediately
  • Queues all changes (inserts, updates, deletes) in Kafka
3

Backfill completes

  • Artie applies the queued CDC changes in order

How to kick off a backfill

By default, when you onboard a new table, Artie will kick off a backfill to copy all the historical data to your destination. This is how you trigger an adhoc backfill:
  1. Click on any running pipeline and go to the pipeline overview page
  2. Click on the Backfill tables button
Triggering an adhoc backfill
Subsequently, canceling a backfill can be done from the Pipeline overview page.

Destination table behavior

When triggering an ad-hoc backfill, you choose what happens to the destination table before the backfill begins:

Do nothing

Artie upserts rows on top of the existing destination data. Use this when the table is already partially populated and you want to fill in missing or outdated rows without re-copying data that is already correct.

Truncate

Removes all rows from the destination table before the backfill begins, while preserving the table schema, indexes, and grants. This is the simplest option for a clean, complete reload of your data.

Drop table

Drops and recreates the destination table from scratch, including its schema. Use this when the table structure has drifted significantly and you need a completely fresh start.
This will break any downstream objects that depend on this table, such as views and materialized views. If you only need to reload the data, use Truncate instead.

Advanced settings

You can configure the additional backfill options from the advanced settings on the source tab in the pipeline editor.
Backfill advanced settings

How are backfills ordered?

In , tables are backfilled in the order they were added, up to the concurrency limit.

What happens to CDC changes during a backfill?

Our process continues to capture all changes to Kafka in the background. Once backfill completes:
  1. We switch to CDC stream
  2. We apply queued changes in order
  3. The table transitions to fully streaming state
This guarantees consistency and prevents stale data overwrites.