Skip to main content

Why destination type matters

Every flush carries a fixed overhead cost — staging table creation time, time spent merging the data and any clean up work. The key to tuning flush rules is understanding how that fixed cost relates to your destination type. Both OLAP and OLTP destinations follow the same general flush pattern: The difference is how much latency each step adds. OLAP destinations (Snowflake, BigQuery, Databricks, Redshift) add significantly more latency per flush. Staging requires uploading files to cloud storage, DDL operations in a warehouse are slow, and — critically — the MERGE step requires a full table scan because OLAP databases don’t have indexes on primary keys. Every flush must scan the entire destination table to find matching rows, regardless of how many rows are in the batch. With small batches, this fixed latency dominates:
Batch sizeOverheadMerge timeTotalPer-row cost
1,000 rows~5s~0.1s~5.1s5.1ms
100,000 rows~5s~2s~7s0.07ms
OLTP destinations (PostgreSQL, MySQL, SQL Server) go through the same steps, but transactional databases have B-tree indexes on primary keys, so the MERGE uses fast index lookups instead of table scans. Combined with lightweight DDL operations, the fixed overhead per flush is low enough that smaller, more frequent flushes work well.
For analytical databases like Snowflake, Databricks, BigQuery, or Redshift:
Setting flush rules too low can hinder throughput and cause latency spikes:
  • Fixed overhead costs: Each flush has connection/metadata overhead that dominates processing time with small batches
  • Inefficient resource usage: OLAP systems are designed for large parallel operations, not frequent micro-operations
  • Storage and query degradation: Many small files hurt compression, increase metadata lookups, and trigger excessive compaction

Recommended approach

Larger, less frequent flushes are optimal because:
  • Columnar storage benefits from batch processing
  • Reduced metadata overhead and better compression
  • More efficient query performance with fewer small files
Example configuration:
  • Rows: 100k-500k
  • Bytes: 50-500 MB
  • Time: 3-15 minutes
For tables with very high write throughput, multi-step merge can be enabled to support extremely large flush batches (1 GB+).

Debugging high latency

If your pipeline latency is higher than expected, use the “Flush Count” graph in the analytics portal to identify which condition is triggering flushes — size, rows, or time — then adjust accordingly.
1

Check the flush reason

Look at the “Flush Count” graph to see which condition is triggering flushes.
2

If the reason is size or rows

Your flushes are triggering before enough data accumulates, producing small batches with high per-row overhead. Increase the limits toward the upper end of the recommended range:
  • Rows: increase toward 500k
  • Bytes: increase toward 500 MB
3

If the reason is time

Increase the time interval (e.g., from 3 minutes to 10-15 minutes). This may seem counterintuitive, but waiting longer allows more data to accumulate per flush, which increases overall throughput by amortizing the fixed merge overhead across a larger batch.
For more details on how flush rules work, see the overview.

Monitoring

You can see which flush rule triggered each flush in the analytics portal:
Flush count graph in the Artie analytics portal showing the number of flushes triggered by each condition (size, rows, or time) over a selected time period