Pythian Blog: Technical Track

When SHOW SLAVE STATUS and the error log Disagree

Or, When MySQL Lies!

When I do a show slave status\G, sometimes mysqld will lie to me and give me a wrong Exec_Master_Log_Pos. Let me explain with a situation from last night.

This is the output of show slave status\G from mysql version 5.0.41-community-log:

mysql> show slave status \G
*************************** 1. row ***************************
                Slave_IO_State: Waiting for master to send event
                   Master_Host: XXX.XXX.XXX.XXX
                   Master_User: replic_username
                   Master_Port: 3306
                 Connect_Retry: 60
               Master_Log_File: mysql-bin.000480
           Read_Master_Log_Pos: 690470773
                Relay_Log_File: db2-relay-bin.000028
                 Relay_Log_Pos: 683977007
      Relay_Master_Log_File: mysql-bin.000480
              Slave_IO_Running: Yes
             Slave_SQL_Running: No
               Replicate_Do_DB:
           Replicate_Ignore_DB: 
            Replicate_Do_Table:
        Replicate_Ignore_Table:
       Replicate_Wild_Do_Table:
   Replicate_Wild_Ignore_Table:
                    Last_Errno: 0
                    Last_Error: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
                  Skip_Counter: 0
           Exec_Master_Log_Pos: 126
               Relay_Log_Space: 690471192
               Until_Condition: None
                Until_Log_File:
                 Until_Log_Pos: 0
            Master_SSL_Allowed: No
            Master_SSL_CA_File:
            Master_SSL_CA_Path:
               Master_SSL_Cert:
             Master_SSL_Cipher:
                Master_SSL_Key:
         Seconds_Behind_Master: NULL

So in summary, the slave SQL thread is stuck (in this case because of a problem during the transfer of the binlog data to the slave’s relay log). The show slave status\G command tells me that it is stuck at the master binlog file mysql-bin.000480, position 126.

But, if I look at the error log file entries when the slave got stuck I see:

080206 16:03:48 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
080206 16:03:48 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000480' position 683976897
080206 16:03:48 [Note] Slave: connected to master 'replic_username@XXX.XXX.XXX.XXX:3306',replication resumed in log  'mysql-bin.000480' at position 683976897
080206 16:04:31 [ERROR] Error in Log_event::read_log_event(): 'read error', data_len: 1966107, event_type: 16
080206 16:04:31 [ERROR] Error reading relay log event: slave SQL thread aborted because of I/O error
080206 16:04:31 [ERROR] Slave: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 0
080206 16:04:31 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000480' position 126

I’ll compare both outputs. show slave status tells me that I’m stuck at:

Relay_Master_Log_File: mysql-bin.000480
Exec_Master_Log_Pos: 126

The error log file says at first that it is stuck at:

'mysql-bin.000480' position 683976897

then later in the error log is says it is stuck at:

'mysql-bin.000480' position 126

So obviously, the last information given in the error log as well as in the show slave status command is wrong. If I had to choose, I would restart the slave at log mysql-bin.000480 position 683976897 using the command: CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000480', MASTER_LOG_POS=683976897;. If not, I’ll be going to try to re-run a full binlog that has been already applied.

So, when a slave gets stuck on relay log corruption, my own general rules are as follows:

  1. If the position is less then, say, 500, suspect that that the position is wrong (it might be right but it needs to be checked out).
  2. Always go in the error log to see what it has to say about position.
  3. If the position in the error log as well as in the show slave status agree, more-or-less (both are a big number), choose the one from show slave status. Otherwise, pick the one in the error log.

NOTE: Please understand that it is possible for the slave to be stuck at a small position. This would happen if the master did change the binlog and the slave got stuck just after the switch of the master binlog. So be careful.

No Comments Yet

Let us know what you think

Subscribe by email