Multi-Step Merge: Unlocking 3x Replication Throughput for Large Tables

Robin Tang

Updated on

March 19, 2025

Product spotlight

Merge operations are notoriously expensive as they require scanning and rewriting large volumes of data, which then results in higher latency for large tables. You can scale this by scaling up the compute layer – by adding more compute resources to scan and rewrite data files faster. However, this drives up compute costs and increases your total cost of ownership.

‍

An alternative way is to increase the size of each merge which will allow more data to be processed in a single operation. This strategy not only optimizes on the inherent overhead but also enables us to replicate up to 3x more data without scaling up the compute layer.

‍

What is "multi-step merge"?

Multi-step merge (MSM) is a way for us to iteratively land data into an intermediary staging table. As such, we can then build up a large enough staging table before invoking a merge against the target table.

‍

Customers could previously configure our merge frequency by specifying the following variables, and we would trigger a merge based on whichever came first:

Flush time (in seconds)
Number of rows¹
Data size²

‍

Customers can now specify flush count, which controls how many times we land data into the staging table before merging into the target table.

‍

*Customers with MSM enabled, can now specify how often Artie should flush to the staging table before invoking a merge.*

‍

When and why should you use MSM?

MSM is ideal for large tables that receive a high volume of changes (at least a few billion per month). By aggregating these changes into larger merge batches, it minimizes the overhead.

‍

Key benefits are:

Increased throughput. Can increase replication throughput by up to 3x by reducing the number of merge operations
Cost efficiency. Avoid the need to scale up compute resources
Reduced ingestion lag. Increased throughput to keep up with incoming writes and minimize latency between your source and destination

‍