Pythian Blog: Technical Track

Watch Out When Running Out of Disk Space With InnoDB Group Replication Cluster

Oracle MySQL Community Edition comes with a native, virtually synchronous, clustering solution called InnoDB Cluster. This clustering solution uses a technology under the hood called Group Replication. This technology is very similar to Galera Cluster, although where Galera relies on their own wsrep (write-set replication) concepts, InnoDB cluster uses native concepts available in regular MySQL such as binary logs, GTID’s and more.

InnoDB Cluster has been available since version 5.7, however it’s much improved in 8.0. A best practice setup would be to have a minimum of three nodes running MySQL 8.0. This is the setup I was running when I encountered the issue triggering this blog post.

mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 14c7f5b3-67d2-11eb-9e96-08002799f743 | node0       |        3306 | ONLINE       | PRIMARY     | 8.0.23         |
| group_replication_applier | fffa336e-67d2-11eb-a791-08002799f743 | node1       |        3306 | ONLINE       | SECONDARY   | 8.0.23         |
| group_replication_applier | 89170bb4-67d3-11eb-91eb-08002799f743 | node2       |        3306 | ONLINE       | SECONDARY   | 8.0.23         |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+

Introducing a failure to the cluster

One of the observations I recently made in an environment I was managing, was that one of the nodes was running out of disk space for storing binary logs. So in my testing environment I chose to become a bad DBA and wreak some havoc into my nicely operating cluster.

node1:~ # dd if=/dev/zero of=/mysql/bin/junk
dd: writing to '/mysql/bin/junk': No space left on device
1815769+0 records in
1815768+0 records out
929673216 bytes (930 MB, 887 MiB) copied, 10.3299 s, 90.0 MB/s

MySQL is aware that this is the current situation:

2021-02-07T10:07:54.879371Z 14 [ERROR] [MY-000035] [Server] Disk is full writing '/mysql/bin/node1-bin.000002' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.

At this point my node1 can no longer commit any transactions to its local data copy as it’s configured to have sync_binlog = 1. What I’ve effectively done is introduce replication lag in my synchronous cluster.

What troubled me in this situation is that the cluster does not seem to be aware of this situation. My node1 status hasn’t changed at all in the cluster status and it still reports to be ONLINE just like my output at the beginning of this post. As for cluster operations, we have two out of three nodes working fine; one as PRIMARY the other as SECONDARY. So even with the “broken node” our cluster should be operating fine, which it actually does.

Wreaking some more havoc

With node1 getting more and more behind in replication I decided to test the cluster response to failure. I decided to kill my SECONDARY node (node2). Rather than cleanly shutting down MySQL I chose to kill -9 my instance. To avoid systemd from restarting the instance I also used systemctl stop mysql.

My replication group noticed the change and removed node2 from the group.

2021-02-07T10:20:29.130938Z 0 [Warning] [MY-011493] [Repl] Plugin group_replication reported: 'Member with address node2:3306 has become unreachable.'
2021-02-07T10:20:45.956837Z 0 [Warning] [MY-011499] [Repl] Plugin group_replication reported: 'Members removed from the group: node2:3306'
2021-02-07T10:20:45.957419Z 0 [System] [MY-011503] [Repl] Plugin group_replication reported: 'Group membership changed to node0:3306, node1:3306 on view 16126893147346366:12.'

At this time the cluster status was as follows:

  • node0: PRIMARY – ONLINE – processing transactions
  • node1: SECONDARY – ONLINE – NOT processing transactions
  • node2: DOWN

I assumed the group would lose quorum since node0 is proposing transactions but there’s no node available to (reliably) certify these transactions.

Relying on binary logs

In the existing situation if we just restart node2, it would be able to connect up to node0 to get the binary logs and perform an incremental recovery to bring itself in a healthy situation again.

What if node0 no longer had the binary logs to do that? On node0 we purge the binary logs that node2 needs.

MySQL  node0:33060+ ssl  JS > \sql FLUSH BINARY LOGS;
Query OK, 0 rows affected (0.0983 sec)

MySQL  node0:33060+ ssl  JS > \sql PURGE BINARY LOGS TO 'node0-bin.000002';
Query OK, 0 rows affected (0.0234 sec)

After this we start node2 again and we watch the error log:

2021-02-07T10:23:03.792160Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.23) starting as process 28032
...
2021-02-07T10:23:06.220509Z 0 [System] [MY-013587] [Repl] Plugin group_replication reported: 'Plugin 'group_replication' is starting.'
...
2021-02-07T10:23:07.236538Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.23'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server - GPL.
...
2021-02-07T10:23:10.078442Z 2 [System] [MY-011511] [Repl] Plugin group_replication reported: 'This server is working as secondary member with primary member address node0:3306.'
2021-02-07T10:23:11.079997Z 0 [System] [MY-013471] [Repl] Plugin group_replication reported: 'Distributed recovery will transfer data using: Incremental recovery from a group donor'
2021-02-07T10:23:11.080484Z 0 [System] [MY-011503] [Repl] Plugin group_replication reported: 'Group membership changed to node0:3306, node2:3306, node1:3306 on view 16126893147346366:13.'
...
2021-02-07T10:23:22.075506Z 22 [System] [MY-010562] [Repl] Slave I/O thread for channel 'group_replication_recovery': connected to master 'mysql_innodb_cluster_15973@node1:3306',replication started in log 'FIRST' at position 4

We see that node2 links up to node1 for incremental recovery. There’s no reason why it shouldn’t do that since node1 is reporting as ONLINE with no issues. However in reality node1 has not been replicating, not committing transactions and thus not generating binary logs. As a result, my node2 is stuck in RECOVERING status until I fix node1 and hopefully make it catch up with the group.

All this time node0 has been writing and committing without any issues while these writes were not getting replicated anywhere. In my opinion, in an InnoDB cluster environment, this is problematic.

Conclusion

The obvious conclusion here is to ensure that you’re properly monitoring your disk space and that you do not let your MySQL server run out of space at any time. Make sure to monitor all volumes if you’ve spread your data-files, binary log files and other log files to different volumes. All writes are important.

To report this issue to the development team, I opened a bug-report on bugs.mysql.com. Feel free to click “affects me” if, like me, you consider this a significant problem.

No Comments Yet

Let us know what you think

Subscribe by email