Skip to main content
Enable partitioning to lower your merge latency and reduce the amount of bytes scanned.

Steps to turn on partitioning

For this example, consider this table in Postgres.
CREATE TABLE events (
    event_id SERIAL PRIMARY KEY,
    event_name VARCHAR(255) NOT NULL,
    description TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
  1. First pause your Artie pipeline
  2. Recreate the table in BigQuery and make sure to use the right partitioning strategy.
-- Recreate the partitioned table
CREATE TABLE artie.events_copy PARTITION BY DATE(created_at) AS SELECT * FROM artie.events;

-- Drop the old table
DROP TABLE artie.events;

-- Rename new to old
ALTER TABLE artie.events_copy RENAME TO events;
  1. Edit your pipeline and update the table settings for events
  2. Click Save and Deploy
A partitioned table is divided into segments, called partitions, that make it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance and control costs by reducing the number of bytes read by a query. You partition tables by specifying a partition column which is used to segment the table.
  • Improve query performance by scanning a partition.
  • When you exceed the standard table quota.
  • Gain partition-level management features such as writing to or deleting partition(s) within a table.
  • Reduce the number of bytes processed + reduce your BigQuery bill
Partitioning typeDescriptionExample
Time partitioning

Partitioning a particular column that is a TIMESTAMP.

BigQuery allows hourly, daily, monthly, yearly and integer range partitioning intervals.

Column: timestamp
Partitioning granularity: daily

Integer range or interval based partitionsPartitioning off of a range of values for a given column.

Say you have a column called customer_id and there are 100 values.

You can specify to have values 0-9 go to one partition, 10-19 the next, etc.

Ingestion-based

This is when the row was inserted.

This is not recommended, because it requires storing additional metadata to know when this row was inserted. If we don’t specify this upon a merge, we will end up creating duplicate copies.

NA
I