Skip to main content

Why Primary Keys Matter 🔑

Primary keys are crucial for data replication because they:
  1. Ensure Data Ordering: We use primary keys as partition keys in Kafka to guarantee the correct sequence of operations
  2. Maintain Data Integrity: Primary keys enable reliable MERGE operations to keep your data consistent

Adding Primary Keys: Two Common Scenarios 🛠️

Let’s say you have a users_no_pk table with a unique email field:
CREATE TABLE users_no_pk (
    email VARCHAR(255) UNIQUE NOT NULL,
    first_name VARCHAR(255) NOT NULL,
    last_name VARCHAR(255) NOT NULL
);
To make this table replication-ready, simply promote the unique email to a primary key:
ALTER TABLE users_no_pk ADD PRIMARY KEY (email);
For tables without any unique identifiers, we can add a new primary key column. Let’s use this example:
CREATE TABLE no_primary_keys (
    key VARCHAR(5),
    value bool
);

INSERT INTO no_primary_keys (key, value) VALUES ('foo', true), ('bar', false);
Current table contents:
postgres=# select * from no_primary_keys;
 key | value
-----+------
 bar | f    
 foo | f  
(2 rows)
Add a new primary key column:
ALTER TABLE no_primary_keys ADD COLUMN pk SERIAL PRIMARY KEY;
-- This automatically:
-- 1. Creates a new SERIAL column
-- 2. Backfills existing rows with sequential values
-- 3. Sets up auto-increment for new rows
After adding the primary key:
postgres=# select * from no_primary_keys;
 key | value | pk
-----+-------+----
 bar | f     |  2
 foo | f     |  1
(2 rows)
💡 Pro Tip: Your application code doesn’t need any changes! You can continue inserting data without specifying the primary key:
INSERT INTO no_primary_keys (key, value) VALUES ('qux', false);
postgres=# select * from no_primary_keys;
 key | value | pk
-----+-------+----
 bar | f     |  2
 foo | f     |  1
 qux | f     |  3
(3 rows)

Need Help? 🤝

Have questions about adding primary keys to your tables? Reach out to us at [email protected]!
I