Skip to main content

How it works

Artie runs two parallel processes during a :
1

Backfill process (historical data)

  • Scans full table in batches, writes directly to destination
  • Bypasses Kafka
  • Can use read replica to minimize primary DB load
2

CDC process (live changes)

  • Reads transaction logs immediately
  • Queues all changes (inserts, updates, deletes) in Kafka
  • Applies queued changes after backfill completes
Once backfill finishes, Artie applies the queued CDC changes in order, bringing the destination to real-time state. This architecture follows Netflix’s DBLog pattern to prevent conflicts and minimize database impact.

Managing backfills

You can track the progress of your backfills in the Analytics Portal or in the Pipeline Overview page.
You can trigger an ad hoc backfill from the Pipeline Overview page by clicking on the Backfill tables button.
Triggering an ad hoc backfill
You can cancel a backfill from the Pipeline Overview page by clicking on the Cancel backfill button.
Canceling a backfill

Questions

When do backfills occur?

  1. When you first launch a pipeline
  2. When you add new tables
  3. When you trigger an ad hoc backfill from the dashboard

How many tables backfill at once?

Default: 10 tables in parallel per pipeline to keep source DB load manageable. Table states:
  • Queued to backfill — Waiting in queue
  • Backfilling — Actively running
Parallelism is tunable for high table counts (500+). Contact us to adjust.

How are backfills ordered?

FIFO (first-in, first-out). Tables backfill in the order they were added, up to the concurrency limit.

What happens to CDC changes during a backfill?

Our process continues to capture all changes to Kafka in the background. Once backfill completes:
  1. We switch to CDC stream
  2. We apply queued changes in order
  3. The table transitions to fully streaming state
This guarantees consistency and prevents stale data overwrites.