In this blog post, we'll go over several improvements we have made to our MySQL connector. These updates streamline database operations, minimize disk usage and I/O activities, and automate data synchronization tasks without complex configuration requirements and management overhead.
Key improvements include:
- Full DDL support without the need to increase
binlog_row_metadata
to be FULL - gh-ost migration support for non-disruptive schema changes
- Automatic fan in for partitioned tables
- GTID support for reliable transaction replication
Full DDL Support
Artie’s new MySQL connector supports full DDL without requiring binlog_row_metadata
to be set to FULL. This is key for avoiding increase in disk usage or higher I/O.
As a quick summary, there are a mixture of DDLs and DMLs within MySQL's binary log (binlog). You can configure the amount of table metadata that gets logged within each row by specifying binlog_row_metadata
settings.
What is binlog_row_metadata
?
This is a system variable that configures the amount of metadata that is added to the binary log when row-based logging is configured.
The valid values you can configure are:
What are the implications of enabling FULL?
There is more information being logged to binary logs which will increase disk usage. This will also result in higher I/O in MySQL topology as there is more data transferred and data to parse out.
How does Artie support DDLs without requiring binlog_row_metadata
to be FULL?
Schema changes like CREATE, ALTER, and DROP are automatically recorded in MySQL's binary logs, regardless of binlog_row_metadata
setting. So, we built a custom SQL parser by leveraging ANTLR that allows us to analyze the SQL scripts and extract schema changes. This also means that we are able to pick up changes like table renames which are crucial in supporting gh-ost migrations (more on gh-ost migrations in the next section).
gh-ost Migration
With our gh-ost migration support, we ensure that online schema changes are conducted without disruptive locks, allowing normal operations and access to the table during the migration process.
gh-ost, which stands for GitHub Online Schema Migration Tool, is a popular open-source framework that allows you to perform online schema changes.
Schema migrations are particularly an issue with MySQL, because MySQL's native ALTER TABLE can lock an entire table, which may block writes and reads.
How does gh-ost work?
- gh-ost begins by preparing a ghost table and incrementally copies the original table's data into it in chunks
- Next, it applies the intended schema changes and alters the ghost table
- The process is set up as a MySQL replica, which allows it to replay binary log events into the ghost table, keeping it synchronized with the original table
- Finally, gh-ost executes an atomic operation that renames and swaps the original and ghost tables, effectively replacing the old table with the updated one without locking the table for use
How does Artie support this out of the box?
Artie monitors schema changes across the MySQL database and only tracks changes for pre-configured tables. Upon detecting a new ghost table, it will automatically add this table to Artie’s in-memory schema adapter.
Our custom-built SQL parser then identifies any incoming table rename events and automatically updates the in-memory schema adapter to reflect this change, ensuring seamless integration of the new table schema.

Automatic fan in for partitioned tables
Artie enhances MySQL deployments by offering an advanced automatic fan-in feature for partitioned tables.
This functionality allows us to intelligently detect new partitions as they are created. Once detected, Artie seamlessly aggregates all changes from these partitions and funnels them into a single specified topic.
This streamlines data consumption and integration processes, making it significantly easier for downstream systems to process data without having to manage multiple partition feeds. This unified approach not only simplifies the data architecture but also improves performance and reduces the overhead associated with handling partitioned data.

GTID support
Enabling GTIDs in MySQL offers significant advantages, including easier failover and recovery processes, and ensuring that there are no duplicate or missed transactions during data replication. GTIDs are supported by Artie out-of-the-box.
What is GTID?
A Global Transaction Identifier (GTID) is a unique identifier for each transaction in a MySQL database environment. GTIDs look like this:
GTID = source_id:transaction_id
Each GTID consists of two parts: the source_id
and the transaction_id
.
- The
source_id
corresponds to the server_uuid of the originating server, identifying where the transaction originated - The
transaction_id
is a sequential number that tracks the order of transactions as they are executed on the source server
Together, they ensure that every transaction can be uniquely identified across all servers in a MySQL topology. This unique identification is crucial for maintaining a consistent and reliable replication process across different servers.
What is a GTID set?
A GTID set is a collection of one or more GTIDs used to represent a group of transactions within a MySQL database system.
A GTID set can combine multiple GTIDs by listing them sequentially, separated by commas, or by using a range notation to consolidate consecutive transaction IDs from the same source. This set notation is essential for efficiently tracking and replicating groups of transactions across servers in a MySQL topology.
For example:
source_id:transaction_id-transaction_id
3E11FA47-71CA-11E1-9E33-C80AA9429562:1-5
The notation 3E11FA47-71CA-11E1-9E33-C80AA9429562:
1-5
indicates a set of five transactions, from transaction ID 1 to 5, all originating from the server with the UUID 3E11FA47-71CA-11E1-9E33-C80AA9429562
.
How do you enable GTIDs?
To enable GTIDs for replicating changes in MySQL, you will need to set these variables to be ON:
- ENFORCE_GTID_CONSISTENCY
- GTID_MODE
Artie will automatically check that these two variables are configured correctly and use GTIDs to replicate changes seamlessly.