Enable BigQuery Partitioning

Enable partitioning to lower your merge latency and reduce the amount of bytes scanned.

Steps to turn on partitioning

For this example, consider this table in Postgres.

CREATE TABLE events (
    event_id SERIAL PRIMARY KEY,
    event_name VARCHAR(255) NOT NULL,
    description TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

First pause your Artie pipeline
Recreate the table in BigQuery and make sure to use the right partitioning strategy.

-- Recreate the partitioned table
CREATE TABLE artie.events_copy PARTITION BY DATE(created_at) AS SELECT * FROM artie.events;

-- Drop the old table
DROP TABLE artie.events;

-- Rename new to old
ALTER TABLE artie.events_copy RENAME TO events;

Edit your pipeline and update the table settings for events
Click Save and Deploy

What is a partitioned table?

Why should we use table partitions?

What are the different kinds of partitioning strategies?

Partitioning type	Description	Example
Time partitioning	Partitioning a particular column that is a TIMESTAMP. BigQuery allows hourly, daily, monthly, yearly and integer range partitioning intervals.	Column: timestamp Partitioning granularity: daily
Integer range or interval based partitions	Partitioning off of a range of values for a given column.	Say you have a column called customer_id and there are 100 values. You can specify to have values 0-9 go to one partition, 10-19 the next, etc.
Ingestion-based	This is when the row was inserted. This is not recommended, because it requires storing additional metadata to know when this row was inserted. If we don’t specify this upon a merge, we will end up creating duplicate copies.	NA

Guides

​Steps to turn on partitioning

Steps to turn on partitioning