Soft Partitioning

Soft partitioning is a logical partitioning strategy implemented at the ingestion layer rather than relying on native database partitioning features. Instead of storing all rows for a given entity in a single large table, we split data into time-based “partition” tables and expose them through a unified view.

Currently Supported Destinations

Redshift
Snowflake

Why Soft Partitioning?

While some databases support native partitioning, others have limitations that make them ineffective for ingestion workloads:

Partitioning Capabilities by Database

PostgreSQL

Supports native partitioning with good performance
Well-suited for both ingestion and query workloads

BigQuery

Supports native partitioning and clustering, but partition pruning is most effective for query workloads rather than high-throughput ingestion.
Partitioning in BigQuery is typically based on a timestamp or date column, but frequent ingestion of small batches can lead to many partitions and higher costs.
Soft partitioning can help optimize ingestion performance and cost by controlling partition granularity and data routing at the ingestion layer.
For instructions on native partitioning, see our BigQuery Partitioning guide.

Snowflake

Has cluster keys that physically co-locate related rows in micro-partitions
Re-clustering is async and can be expensive (consumes credits outside of a warehouse)
Query pruning is less predictable
More optimized for query performance than ingestion throughput

Redshift

Has distribution keys (to control data placement across nodes) and sort keys (to allow pruning)
More useful for joins and analytical queries
Not well-suited for rapid ingestion and pruning for high throughput

Soft partitioning aims to bridge this gap by delivering:

Predictable ingestion performance
Explicit partition control
Simpler data management
Freedom from underlying database partitioning constraints

Usage

How It Works

Consider a table user_events with the following columns:

CREATE TABLE user_events (
  uuid UUID NOT NULL,
  event_name VARCHAR(255) NOT NULL,
  user_id BIGINT NOT NULL,
  created_at TIMESTAMP NOT NULL,
  properties JSONB,
  PRIMARY KEY (uuid, created_at)
);

1. Dynamic Table Creation

The partition scheme can be by month or day, and is keyed off the created_at column. Artie will dynamically create tables for each partition period. For example, with monthly partitions:

user_events_2025_08
user_events_2025_09
user_events_2025_10
...etc

2. Automatic Data Routing

As new data arrives, Artie writes each row to the appropriate partition table based on its created_at timestamp. Future partitions (e.g., next 2-3 months) are pre-created and added to the unified view.

3. Unified View

Artie automatically maintains a view for user_events that unions all partitions and updates the definition as new partitions are created:

CREATE OR REPLACE VIEW user_events AS (
  SELECT
    uuid,
    event_name,
    user_id,
    created_at,
    properties,
    'user_events_default' AS artie_full_partition_name
  FROM user_events_default
  UNION ALL
  SELECT
    uuid,
    event_name,
    user_id,
    created_at,
    properties,
    'user_events_2025_08' AS artie_full_partition_name
  FROM user_events_2025_08
  UNION ALL
  SELECT
    uuid,
    event_name,
    user_id,
    created_at,
    properties,
    'user_events_2025_09' AS artie_full_partition_name
  FROM user_events_2025_09
  UNION ALL
  SELECT
    uuid,
    event_name,
    user_id,
    created_at,
    properties,
    'user_events_2025_10' AS artie_full_partition_name
  FROM user_events_2025_10
);

4. Compaction

As the number of partitions grows, Artie compacts old partitions by merging their data back into the _default table. You have the flexibility to specify how many previous partitions are retained before compaction is triggered.

Enabling Soft Partitioning in Artie

Note: if the table already exists in the destination, just contact us and we’ll help you migrate it to be soft partitioned.

Navigate to the Tables tab in the pipeline. Make sure that the source and destination is set and the desired table is set to Replicating. Click on the gear icon.
The Enable soft partitioning option lists the names of the partition tables that will be created if the destination is set.
Check the checkbox and choose the partition column (in this case created_at). You may choose the partition frequency and the number of old partitions to be maintained as well.
Click Done and save the pipeline.
Deploy the pipeline. Validation will fail if a table exists in the destination with the same name as the view. (In this case, inventory_user_events.) You may choose to rename the existing table or contact us for a one time migration.
After enabling soft partitioning, a recurring job runs in the background to manage the view and the partitioned tables. You will receive a notification about actions performed as part of this job. (For e.g. creation of new partition tables)

It is not recommended to create objects that depend on the unified view or partition tables as that will interfere with the view management actions that are run periodically.These actions include:

Creation of new partition tables. (Unified view is recreated to include the new tables)
Compaction of old partition tables. (Unified view is recreated to exclude the old tables)
Adding new columns to all managed tables and views. (Unified view is recreated to include the new column)

Destination-specific recommendations:

Redshift

Consider using late-binding views instead of standard views.

Redshift query to check for dependent objects

  SELECT
    dependent_ns.nspname as dependent_schema,
    dependent_view.relname as dependent_view,
    source_ns.nspname as source_schema,
    source_table.relname as source_table,
    pg_attribute.attname as column_name
  FROM
    pg_depend JOIN pg_rewrite ON pg_depend.objid = pg_rewrite.oid
    JOIN pg_class as dependent_view ON pg_rewrite.ev_class = dependent_view.oid
    JOIN pg_class as source_table ON pg_depend.refobjid = source_table.oid
    JOIN pg_attribute ON pg_depend.refobjid = pg_attribute.attrelid AND pg_depend.refobjsubid = pg_attribute.attnum
    JOIN pg_namespace dependent_ns ON dependent_ns.oid = dependent_view.relnamespace
    JOIN pg_namespace source_ns ON source_ns.oid = source_table.relnamespace
  WHERE
    source_ns.nspname = '<SCHEMA_TO_CHECK>' AND
    source_table.relname = '<TABLE_TO_CHECK>'
  ORDER BY 1,2;

Snowflake
- Queries to check for dependent objects

Target Workloads

While soft partitioning can work for any datasets with a timestamp column, it’s particularly effective for:

Time-Series, Append-Only Data

Event streaming
IoT sensor data
Transaction logs
Application metrics

High-Volume Ingestion Pipelines

Where the cost of updating large tables is high
Merge workloads that often target recent slices of data
Scenarios requiring predictable ingestion performance

Benefits

Benefits of Soft Partitioning

Faster ingestion and better performance: Smaller, time-based partitions enable quicker writes and allow queries to efficiently target specific time ranges, improving overall performance.
Reduced lock contention: Operations on different partitions occur independently, minimizing blocking and improving concurrency.
Simplified data management: Data is clearly separated by time periods, making it easier to manage, maintain, and apply flexible retention policies—old partitions can be dropped, compressed, or archived as needed.
Lower storage and compute costs: Only relevant partitions are scanned during queries, reducing compute usage, and older data can be archived or compressed to save on storage.
Predictable scaling and costs: Ingestion and query performance remain consistent as data volume grows, ensuring costs and resource usage are easier to forecast.
Database agnostic: Soft partitioning works consistently across different database systems, providing operational flexibility.

Guides

​Soft Partitioning

​Currently Supported Destinations

​Why Soft Partitioning?

​PostgreSQL

​BigQuery

​Snowflake

​Redshift

​Usage

​1. Dynamic Table Creation

​2. Automatic Data Routing

​3. Unified View

​4. Compaction

​Target Workloads

​Time-Series, Append-Only Data

​High-Volume Ingestion Pipelines

​Benefits

Soft Partitioning

Currently Supported Destinations

Why Soft Partitioning?

PostgreSQL

BigQuery

Snowflake

Redshift

Usage

1. Dynamic Table Creation

2. Automatic Data Routing

3. Unified View

4. Compaction

Target Workloads

Time-Series, Append-Only Data

High-Volume Ingestion Pipelines

Benefits