Soft Partitioning
Soft partitioning is a logical partitioning strategy implemented at the ingestion layer rather than relying on native database partitioning features. Instead of storing all rows for a given entity in a single large table, we split data into time-based “partition” tables and expose them through a unified view.Currently Supported Destinations
- Redshift
- Snowflake (coming soon)
Why Soft Partitioning?
While some databases support native partitioning, others have limitations that make them ineffective for ingestion workloads:PostgreSQL
- Supports native partitioning with good performance
- Well-suited for both ingestion and query workloads
BigQuery
- Supports native partitioning and clustering, but partition pruning is most effective for query workloads rather than high-throughput ingestion.
- Partitioning in BigQuery is typically based on a timestamp or date column, but frequent ingestion of small batches can lead to many partitions and higher costs.
- Soft partitioning can help optimize ingestion performance and cost by controlling partition granularity and data routing at the ingestion layer.
- For instructions on native partitioning, see our BigQuery Partitioning guide.
Snowflake
- Has cluster keys that physically co-locate related rows in micro-partitions
- Re-clustering is async and can be expensive (consumes credits outside of a warehouse)
- Query pruning is less predictable
- More optimized for query performance than ingestion throughput
Redshift
- Has distribution keys (to control data placement across nodes) and sort keys (to allow pruning)
- More useful for joins and analytical queries
- Not well-suited for rapid ingestion and pruning for high throughput
- Predictable ingestion performance
- Explicit partition control
- Simpler data management
- Freedom from underlying database partitioning constraints
How It Works
Consider a tableuser_events
with the following columns:
1. Dynamic Table Creation
The partition scheme can be by month or day, and is keyed off thecreated_at
column. Artie will dynamically create tables for each partition period. For example, with monthly partitions:
2. Automatic Data Routing
As new data arrives, Artie writes each row to the appropriate partition table based on itscreated_at
timestamp. Future partitions (e.g., next 2-3 months) are pre-created and added to the unified view.
3. Unified View
Artie automatically maintains a view foruser_events
that unions all partitions and updates the definition as new partitions are created:
4. Compaction
As the number of partitions grows, Artie compacts old partitions by merging their data back into the_default
table.
You have the flexibility to specify how many previous partitions are retained before compaction is triggered.
Target Workloads
While soft partitioning can work for any datasets with a timestamp column, it’s particularly effective for:Time-Series, Append-Only Data
- Event streaming
- IoT sensor data
- Transaction logs
- Application metrics
High-Volume Ingestion Pipelines
- Where the cost of updating large tables is high
- Merge workloads that often target recent slices of data
- Scenarios requiring predictable ingestion performance
Benefits
Benefits of Soft Partitioning
Benefits of Soft Partitioning
- Faster ingestion and better performance: Smaller, time-based partitions enable quicker writes and allow queries to efficiently target specific time ranges, improving overall performance.
- Reduced lock contention: Operations on different partitions occur independently, minimizing blocking and improving concurrency.
- Simplified data management: Data is clearly separated by time periods, making it easier to manage, maintain, and apply flexible retention policies—old partitions can be dropped, compressed, or archived as needed.
- Lower storage and compute costs: Only relevant partitions are scanned during queries, reducing compute usage, and older data can be archived or compressed to save on storage.
- Predictable scaling and costs: Ingestion and query performance remain consistent as data volume grows, ensuring costs and resource usage are easier to forecast.
- Database agnostic: Soft partitioning works consistently across different database systems, providing operational flexibility.