August 28, 2025

Unifying Tables Across Schemas

For teams managing sharded or micro-sharded databases, downstream complexity multiplies fast. Each shard or schema produces its own copy of every table. That means instead of M tables, you end up with N × M tables in your warehouse (where N = number of shards/schemas, M = number of tables). Analysts are stuck stitching them back together, engineers write endless union queries, and operations teams lose the clean, consolidated view they need. Artie now supports: Unifying Tables across Schemas You can now unify tables across schemas directly in replication. Instead of landing one table per schema, Artie automatically merges them into a single, consolidated destination table. Take an e-commerce platform sharding customers across 50+ schemas: instead of 50 separate users tables, you now get one unified users table downstream. Or a payments company splitting transactions across micro-shards: all those rows flow neatly into one transactions table in Snowflake. With Artie’s fan-in option, the number of downstream tables is simplified back to M, and schema evolution is handled automatically. Why this matters:
  • Simplified data model: query one table instead of wrangling dozens, with schema evolution managed for you.
  • No duplication: eliminate manual unions or stitching scripts in the warehouse.
  • Consistent structure: unified naming across shards improves data quality and usability.
  • Effortless scaling: add new shards upstream, and they automatically merge downstream.

August 21, 2025

Microsoft Teams Notifications

Artie has long supported notifications via email and Slack. That worked fine, but for teams who live in Microsoft Teams, having alerts show up directly in their workspace is a big quality-of-life improvement. Artie now supports: Microsoft Teams Notifications You can now receive pipeline alerts and notifications directly in Microsoft Teams. Whether it’s replication status, schema changes, or operational alerts, everything flows into the same workspace your team already uses. This builds on our existing notification support (email and Slack), giving you another way to stay connected to your pipelines. Docs 👉 How to enable Teams notifications

August 19, 2025

Large JSON Support for Redshift

JSON payloads are everywhere — but when they get large, most tools fall short. Many platforms land JSON into VARCHAR(MAX), which caps out at ~65k characters. For teams working with rich event logs, nested API responses, or anything more complex, that limit means truncated payloads and lost data. Artie now supports: Large JSON support for Redshift When Artie encounters large JSON payloads, we now land them as the SUPER data type in Amazon Redshift. SUPER supports documents up to 16MB in size, preserving the full payload without truncation. That means you can capture, query, and transform large, complex JSON documents in Redshift without compromise. Why this matters:
  • Preserve entire JSON payloads instead of losing data to truncation
  • Unlock the full power of Redshift’s semi-structured query capabilities with SUPER
  • Handle large event logs, nested API responses, and other big JSON columns with ease
  • Avoid manual workarounds or post-processing to recover lost information
Working with big JSON in Redshift and want to see the difference? Reach out to our team — we’ll help you get set up.

August 14, 2025

Specifying Snowflake Roles

Some Snowflake service accounts are like Swiss Army knives — they have multiple roles, each with its own permissions and environment. Until now, Artie simply authenticated with the service account’s default role. That worked for straightforward setups, but for teams running multiple environments (like staging, pre-prod, and prod) from a single service account, it meant juggling credentials or sticking to a one-size-fits-all role. Artie now supports: Specifying Snowflake roles You can now tell Artie exactly which Snowflake role to use when authenticating with a service account. image (1).png Example: If your company runs staging, pre-prod, and prod in Artie, you can use a single service account for all of them — just assign a different role per environment. This keeps credentials simple while ensuring each environment only has access to what it needs. Why this matters:
  • Simplifies credential management — no more creating and rotating multiple service accounts for different environments
  • Keeps environments isolated — staging stays staging, prod stays prod, even with the same account
  • Supports better security practices — roles can be scoped to the exact permissions needed
  • Reduces operational overhead — fewer accounts to configure, monitor, and maintain

August 12, 2025

Sub-Second Pipeline Deployment

When you’re rolling out dozens (or hundreds) of pipelines at once, every second counts. The old 3–5 second deploy time per pipeline worked fine for smaller updates — but for large-scale rollouts, those seconds piled up fast, sometimes even hitting Terraform’s execution timeouts. That meant splitting deployments into batches, manually tracking progress, and adding friction to what should’ve been a quick rollout. Artie now supports: Sub-second pipeline deployment Pipeline deployments now complete in under 0.5 seconds each. Whether you’re launching 10 pipelines or 100+, they’ll deploy in a fraction of the time — keeping Terraform applies well within limits and eliminating the need for batching or manual retries. Example: One customer managing 150+ pipelines can now spin up 40 new pipelines in under 20 seconds, with their largest rollouts finishing in minutes. Why this matters:
  • Keeps Terraform applies under execution limits — no more timeout failures
  • Eliminates the need for splitting deployments into smaller batches
  • Cuts large-scale rollout time to seconds/minutes
  • Frees engineers from manual progress tracking and retries

August 7, 2025

Flush Metrics Now Available in Analytics

For teams replicating high-volume data into destinations like Snowflake, BigQuery, or Postgres, setting the right flush rules is key to balancing freshness, cost, and performance. But without visibility, tuning those rules can feel like guesswork. Artie now supports: Flush Metrics in the Analytics dashboard Flush rules let you control when data gets written from Artie’s streaming buffer to your destination — based on time intervals, row counts, or byte thresholds. With Flush Metrics, you now get a clear view into how those rules are performing. Take a healthtech team syncing MySQL to BigQuery: they’ve configured a 60-second or 1MB flush rule. Now, they can see exactly how often data flushes, what triggered it, and how long it took — helping them optimize for cost and latency without guessing. image.png Why this matters:
  • Tune pipelines with data — not guesswork
  • Validate whether your flush rules are hitting SLAs
  • Optimize for warehouse costs by spotting over-frequent writes
  • Troubleshoot delays and fine-tune performance in seconds
Head to the Analytics dashboard or read the docs for how to get started. Need help tuning? We’re here.

August 5, 2025

New Destination: Postgres

Not every use case belongs in a warehouse. Teams often need to move transactional data into Postgres to power real-time APIs, partner-facing systems, or operational dashboards — without the complexity of Snowflake or the fragility of DIY CDC scripts. Until now, Artie’s destinations focused on analytics platforms. But operational systems matter too — and we’re making sure you’re covered. Artie now supports: Postgres as a destination You can now stream changes from your source databases directly into Postgres — just like any other Artie pipeline. It’s fully managed, fault-tolerant, and handles schema changes and backfills automatically. Some teams are already using it to power internal tools by syncing MySQL to Postgres — skipping the warehouse entirely. Others are using Postgres-to-Postgres replication to isolate production workloads or build live replicas for disaster recovery. Same reliability. New destination. Why this matters:
  • Power real-time APIs and dashboards without a warehouse
  • Eliminate fragile CDC scripts with a fully managed solution 
  • Sync across Postgres instances to isolate workloads or support disaster recovery
  • Handle schema changes and backfills automatically — no maintenance required
Curious if Postgres fits your use case? Check the docs or reach out — we’ll help you get started.

July 29, 2025

Self-Serve DynamoDB Backfills

Backfills are a critical step in onboarding new pipelines — especially when you’re working with historical data in DynamoDB. Until now, Artie handled that part for you, kicking off a table export behind the scenes. That worked fine — unless something broke. If your AWS role didn’t have the right permissions or something else went sideways, users were left guessing. Now, that guesswork is gone. Introducing: Self-Serve DynamoDB Backfills When setting up a DynamoDB pipeline, you’ll now see a guided flow in the UI that helps you kick off a backfill. You can export the table directly from your AWS account — or select an existing export if you’ve already started one manually. Screenshot 2025-07-24 at 1.08.43 PM.png No more opaque failures. If something’s wrong (like missing permissions), you’ll see it immediately and can fix it yourself — without waiting or wondering. Why this matters:
  • DynamoDB backfills are now fully transparent and user-controlled
  • Errors (like missing permissions) are surfaced immediately for faster fixes
  • Reuse recent exports — no need to start from scratch
  • Smoother, more reliable onboarding for new DynamoDB pipelines
Setting up a new DynamoDB pipeline? You’ll see the new backfill flow automatically. Questions? Reach out — we’re here to help.

July 22, 2025

Parallel Segmented Backfills for Postgres

CTID-based backfills are fast and efficient — especially for large, append-only Postgres tables. They scan directly by physical row location, often outperforming logical queries in stable datasets. But CTIDs come with tradeoffs: they’re slow to initialize for large tables, fragile in dynamic tables where rows update or move, and they can time out in environments with aggressive statement_timeout settings. For teams working with massive, constantly changing Postgres tables, these limitations can stall backfill progress or create reliability risks. Artie now supports schema alignment across environments. Parallel Segmented Backfills offer an alternative path. Instead of relying on CTID, Artie slices tables into logical row segments based on integer primary keys — then parallelizes the work across those chunks. The result: similar performance to CTID backfills, but with stronger guarantees in dynamic environments. We recently helped a customer backfill 8 billion rows in an actively updated Postgres table. CTID-based scans kept timing out and drifting. With Parallel Segmented Backfill, we split the workload across logical row ranges and completed the job — no timeouts, no skipped rows, no guesswork. Why this matters:
  • Resilient to updates and vacuuming — row movement doesn’t break backfills
  • Offers CTID-level performance with better reliability under load
  • Avoids statement_timeout failures in large or busy tables
  • Makes backfill behavior predictable and tunable

July 17, 2025

Improved DDL Support for Cell-Based Architectures

In environments with multiple isolated databases — like production, staging, and dev — schema drift is a persistent risk. Columns added in one cell might not appear in another unless there’s active data flowing through. That means teams looking at the “same” table in Snowflake could be seeing different structures, leading to confusion, bugs, and broken dashboards. This became especially painful for teams whose QA and Dev environments receive little to no traffic. With our previous behavior, tables wouldn’t update unless a row changed — leaving environments out of sync. Artie now supports schema alignment across environments. We’ve introduced a new opt-in job that automatically checks and syncs table schemas across environments — even when there are no row changes. If a new column shows up in production, it’ll get added to dev and staging too, so all environments stay aligned. This feature ensures you get consistent schemas, no matter how much (or little) traffic a database gets. Why this matters
  • Guarantees column consistency across environments (prod, staging, dev)
  • Eliminates silent schema drift in low-traffic databases
  • Supports cell-based and single-tenant architectures out of the box
  • Reduces debugging time and improves trust in test environments

July 3, 2025

External Stage Support for Snowflake

Some teams need more control over where their data goes — and how it gets there. Maybe it’s for compliance. Maybe audit. Or maybe they just don’t want Snowflake touching their data until the very last step. By default, Artie loads delta files into Snowflake using internal staging before merging them into the target table. That worked fine for most workflows — but some teams need an extra layer of control over how data flows through their environment. Now: Artie supports external staging. You can configure Artie to write delta files to a Snowflake external stage — like your own S3 bucket — and we’ll read from there when applying changes to your target table. This gives organizations — like federal agencies using Snowflake Gov Cloud — the ability to use an external stage in their own environment, keeping data fully under their control for things like internal review, validation, or security scanning before deciding to merge into Snowflake. Same fully managed sync. Just with the files landing in your environment first. Why this matters ✅ You keep full visibility into what’s being staged before it’s merged
✅ You can retain delta files for auditing or reprocessing — entirely on your terms
✅ You get tighter control over when data crosses trust boundaries
There’s no impact on performance. No extra cost. Just more flexibility, when you need it. Want to turn this on? Let us know — we’ll help you get set up.

June 26, 2025

Column Control: Include, Exclude, Hash

Not every column needs to make it to your warehouse.

Some fields are sensitive. Some are noisy. Some just don’t belong anywhere near analytics.
Now, you can decide exactly what gets replicated — and what doesn’t — with Artie’s expanded column-level controls.
Here’s what’s now possible, per column:
✅ Inclusion — define an allowlist. Only replicate what you explicitly approve; otherwise, ignore
🚫 Exclusion — let most of the table through, but block the columns you don’t want downstream
🔐 Hashing — keep the structure, mask the value. Track fields like user IDs, without exposing data
Pasted_Image_6_18_25__5_06 PM.png

Why this matters:

This isn’t just a cleanup job. It’s control over what leaves prod. Inclusion Sometimes, it’s not about removing sensitive fields — it’s about only sending the ones you trust. Inclusion rules flip the default: instead of replicating everything and hoping exclusions or hashes catch the risky stuff, you define exactly what gets through — and block the rest. What this means:
  • Safer by default — no surprises when new columns show up
  • Compliance-friendly — ideal for PII and financial data
  • Cleaner data — only the fields analytics and ML teams actually need
Exclusion Most of the table is fine. But those one or two fields? No thanks. 
Exclusion rules let you drop what doesn’t belong — without touching your schema.
Use it when:
  • You’re skipping internal metadata, debug fields, or legacy junk
  • You want to trim without breaking things
  • You’re migrating slowly and need a guardrail, not a wall
Hashing Some fields need to be trackable — but not readable. 
Hashing keeps them in your pipeline without exposing what’s inside.
Reach for hashing when:
  • You want to track user behavior across systems without exposing identity
  • You’re debugging and need to confirm values match across systems — without logging sensitive data
  • You’re sharing a warehouse and want to prevent exposing raw PII to teams that don’t need it
  • You only need to know whether or not a value has changed
Want to configure it? 
Column-level rules are set at the source. This guide explains where they belong and why.

June 19, 2025

CDC for Tables Without Primary Keys

Some tables are weird. No primary key (PK), maybe just a unique index or some composite hack someone added in 2017. Until now, those were off-limits for replication. You can now override PK requirements by specifying a unique index — including composite indexes. Artie will respect the exact column order to ensure optimal performance. Why index-based PK overrides matter: Not every table has a clean PK. Some use unique indexes or composite keys that aren’t formally declared as PKs. Until now, these tables were difficult (or impossible) to replicate. This change addresses one of the most common blockers for CDC at scale. What’s changed:
  • PK override: Define row identity with a unique index
  • Use composite keys — even if unofficial or unenforced
  • Preserve the exact index column order – it affects how changes are captured and impacts query performance during replication (e.g., email, account_id, created_at)
This unlocks flexible replication for legacy systems, denormalized tables, and high-volume sources — without compromising performance. When to use index-based keys:
  • Your table lacks a formal PK, but has a unique constraint or index
  • You rely on composite keys to identify rows
  • You’re dealing with legacy systems or data models that weren’t built with CDC in mind
How to enable key overrides: Reach out to enable key overrides — we’ll help define your index logic and validate it during setup.

June 17, 2025

Backfill Tuning: Picking the Right Batch Size

You can now control how many rows Artie processes at a time during backfills. The default is now 25,000 rows per chunk (up from 5,000), but you can tune this based on performance vs. load tradeoffs. Why backfill batch size matters: Backfills aren’t one-size-fits-all. Some teams want speed. Others are sensitive to database load and tiptoeing around a production DB at 2am. Until now, everyone got the same batch size of 5,000 rows per chunk. Now you can tune backfills to match your style:
  • The default is 25,000 rows — we benchmarked a bunch of sizes. 25,000 rows won out. So that’s our new default
  • You have control — adjust the batch size to fit your environment
How to tune batch size based on your workload: Screenshot 2025-06-16 at 5.40.32 PM.png
  • Speed up backfills: Larger chunks = fewer queries, can improve throughput, but overly large chunks can backfire. It’s about finding balance.
  • Reduce DB load: Smaller chunks = faster queries, lower impact on source
Need help tuning batch size? If you’re unsure what batch size is right for your workload, reach out — we’ll help you tune it.

Read Once and Write to Multiple Destinations

You can now sync data from a single database to multiple destinations — all from the same connector. Why this matters:
  • Reduce load on production databases by avoiding duplicate reads and minimizing replication slot overhead
  • Ability to fan out to multiple tools — e.g., write to both Snowflake and Redshift
  • Ability to support diverse use cases in parallel — analytics, ML, real-time alerting
This feature is designed for organizations that:
  • Operate across multiple data platforms 
  • Serve many internal teams with different tools
  • Need to scale data infrastructure without increasing operational burden
If you’re planning a multi-destination architecture, we’d be happy to help — just reach out.

June 2, 2025

Iceberg Support Using S3 Tables

This launch adds something big: support for Apache Iceberg using S3 Tables. Artie customers can now:
  • Stream high-volume datasets into Iceberg-backed tables stored on S3
  • Use S3 Tables’ fully managed catalog, compaction, and snapshot management
  • Query efficiently with Spark SQL (via EMR + Apache Livy) without wrestling with cluster glue
  • Get up to 3x faster query performance thanks to automatic background compaction
Changelog Snip Pn Why is Iceberg a big deal? Because it solves what’s frustrating and limiting about traditional S3-based data lakes. Hive tables are rigid and brittle, with no snapshotting or time travel. Delta Lake is powerful but tied to the Databricks ecosystem. Plain S3 file storage? No metadata layer, no transactions, no query optimizations. Instead, Iceberg gives you a fully open, cloud-native table format with smooth schema evolution, hidden partitions, snapshot isolation, and time-travel queries – all with broad engine support (Spark, Trino, Flink, Presto, Hive). We’re excited about this because it means Artie customers can confidently move massive data volumes without needing to hand-build the plumbing – Iceberg and S3 Tables handle schema changes, partitioning, compaction, and snapshot management behind the scenes, so the system scales cleanly without brittle, custom workflows. 📚 Want to set up Iceberg-backed pipelines? Docs to get started: https://artie.com/docs/destinations/iceberg/s3tables

May 14, 2025

S3 Iceberg destination (Beta)

S3 Iceberg is now available in beta! This new destination uses AWS’s recently released S3 Tables support, allowing you to replicate directly into Apache Iceberg tables backed by S3. It’s a big unlock for teams building modern lakehouse architectures on open standards.

Column Inclusion Rules

You can now define an explicit allowlist of columns to replicate - ideal for PII or other sensitive data. This expands our column-level controls alongside column exclusion and hashing. Only the fields you specify get replicated. Everything else stays out.

Autopilot for New Tables

Stop manually hunting for new tables in your source DB. Autopilot finds and syncs them for you - zero config required. Turn it on via: Deployment → Destination Settings → Advanced Settings → “Auto-replicate new tables” Autopilotfor New Tables Pn

Data Quality: Rows Affected Checks

To further enhance the data integrity built into our pipeline, we’ve added another guardrail: verifying the number of rows affected during each database operation. For example, during merge steps (such as in Snowflake), we confirm ROWS_LOADED from copy commands and validate the totals for inserted, updated, or deleted rows. This approach reinforces the robustness of our data replication process and it’s another way we catch issues early and ensure replication integrity.

Read Once, Write Many

We recently launched the ability to read-once and write to multiple destinations. This means you no longer need multiple replication slots on your source database. For example, by reading data just once from your Postgres instance and simultaneously replicating it to Snowflake and Redshift, you reduce database overhead and simplify replication architecture.‍

Multi-Data Plane Support

Artie now supports hosting pipelines across multiple data planes, whether you’re on our cloud or using your own (BYOC) infrastructure. For example, run one pipeline from Postgres to Snowflake in AWS US-East-1 and another from MySQL to Snowflake in AWS US-West-2.

Oracle Fan-in

With our Oracle Fan-in feature, you can now easily replicate data from thousands of Oracle sources - without painful manual setups or infrastructure overload. Fan-in reduces your Kafka topic sprawl, lowers infrastructure costs, and simplifies real-world, complex data replication.