Merge operations are notoriously expensive as they require scanning and rewriting large volumes of data, which then results in higher latency for large tables. You can scale this by scaling up the compute layer – by adding more compute resources to scan and rewrite data files faster. However, this drives up compute costs and increases your total cost of ownership.
An alternative way is to increase the size of each merge which will allow more data to be processed in a single operation. This strategy not only optimizes on the inherent overhead but also enables us to replicate up to 3x more data without scaling up the compute layer.
What is "multi-step merge"?
Multi-step merge (MSM) is a way for us to iteratively land data into an intermediary staging table. As such, we can then build up a large enough staging table before invoking a merge against the target table.
Customers could previously configure our merge frequency by specifying the following variables, and we would trigger a merge based on whichever came first:
- Flush time (in seconds)
- Number of rows1
- Data size2
Customers can now specify flush count, which controls how many times we land data into the staging table before merging into the target table.

When and why should you use MSM?
MSM is ideal for large tables that receive a high volume of changes (at least a few billion per month). By aggregating these changes into larger merge batches, it minimizes the overhead.
Key benefits are:
- Increased throughput. Can increase replication throughput by up to 3x by reducing the number of merge operations
- Cost efficiency. Avoid the need to scale up compute resources
- Reduced ingestion lag. Increased throughput to keep up with incoming writes and minimize latency between your source and destination
How you can enable MSM with Artie
We are doing a phased roll out. If you'd like early access to the feature, please get in touch with us at [email protected].

[1],[2]: If we are merging, then this value is de-duplicated. If we are appending (via history mode), then this is not de-duplicated.