Pythian Blog: Technical Track

MySQL streaming Xtrabackup to slave recovery

Overview

There are times when I need to restore a slave from a backup of a master or another slave, but too many times I have taken the typical approach of taking the backup on the source server and then copying it to the target server. This takes a great deal of time, especially as your database grows in size. These steps are going to detail how to use Netcat (nc) and Percona Xtrabackup (innobackupexec) to stream the backup from the source server to the target server, saving a great deal of time by copying the data only once to the desired location. While the data is streaming to the target server, it's being compressed and then uncompressed on the fly, reducing the amount of traffic going across the network by around 85% to 90% (typical backup compression ratios of MySQL Innodb I have witnessed are in this range). I will also provide a simple script to complete these steps to give you a start at creating your own customized solution.

Requirements

In order to accomplish this task, you need to keep the following items in mind:
  • Netcat (nc) - Application needed on both servers, used for data streaming.
  • Percona Xtrabackup - Application needed on both servers, used to perform the backup.
  • MySQL Access - MySQL access is required in order to do the backup on the master, and the slave configuration on the slave.
  • Pigz (optional) - This is only needed if you want to compress and uncompress it on the fly. If you are going to use this, it's needed on both servers.
  • Debian - All code and scripts were tested using Debian. Commands may slightly change with different OS's.

Steps

Here are the steps that are required to accomplish this task. The source server is the server where the backup is coming from. The target server is where the backup is going to.

Step 1. Stop MySQL on target server and clear MySQL data

On the server that needs to be restored, we will make sure that MySQL is stopped. Then we will clear out the old data as this will all be replaced with the backup coming from the source server. The example assumes your MySQL data directory is /var/lib/mysql. [code language="bash"] service mysql stop rm -rf /var/lib/mysql/* [/code]

Step 2. Start listener on target server

Now that the target server has its MySQL data directory cleared of its old data, it is now ready to receive the new backup directly from the source server. The port 2112 can be changed, but the port needs to match on both source and target commands. Nc command and options may vary by OS. [code language="bash"] nc -l -p 2112 | unpigz -c | xbstream -x -C /var/lib/mysql No Compression would be nc -l -p 2112 | xbstream -x -C /var/lib/mysql [/code]

Step 3. Start backup on source server

The listener is now up on the target server, ready to accept the connection. We will now start the backup on the source server to stream to the target server. Update the parallel option to match the number of cpu cores on your server. Use the lower core count between the source and target server. The port 2112 can be changed, but the port needs to match on both source and target commands. Nc command and options may vary by OS. [code language="bash"] innobackupex --stream=xbstream --parallel=4 /tmp | pigz -c --fast | nc -w 2 TARGET_SERVER 2112 No Compression would be innobackupex --stream=xbstream --parallel=4 /tmp | nc -w 2 TARGET_SERVER 2112 [/code]

Step 4. Prepare backup on target server

Percona xtrabackup requires that you prepare the backup after it has been completed to apply any outstanding logs and get the database ready to be started. Use as much memory on your target server as you can without causing it to go OOM (Out of Memory). As an example, 75% of your total memory would be a good starting point if there is nothing else running on your server. On a server with 4G of RAM you could safely set user memory to 3G. [code language="bash"] innobackupex --use-memory=3G --apply-log /var/lib/mysql [/code]

Step 5. Update ownership and start MySQL

Now that the apply logs step has completed on your backup, you should be able to update the ownership of the files to mysql:mysql, and then start the MySQL service. [code language="bash"] chown -R mysql:mysql /var/lib/mysql service mysql start [/code]

Step 6. Configure replication

If the source server is a slave, you should be able to just start the new slave as the positioning will be already configured. If your source server is the master, then you will have to figure out if you are using GTID or legacy replication. If you are using GTIDs, you should be able start replication with gtid_purged being set and the master auto position parameter. If you are using legacy replication, you can find the master log and position in the xtrabackup_binlog_info file in your backup directory. In this scenario, the backup directory is the MySQL data directory (/var/lib/mysql/) on the target server. GTID Replication [code language="bash"] mysql CHANGE MASTER TO MASTER_HOST='<MASTER_SERVER>', MASTER_USER='<REPL_USER>', MASTER_PASSWORD='<REPL_PASS>', MASTER_AUTO_POSITION = 1; START SLAVE;" [/code] Legacy Replication [code language="bash"] cat /var/lib/mysql/xtrabackup_binlog_info mysql CHANGE MASTER TO MASTER_HOST='<MASTER_SERVER>', MASTER_USER='<REPL_USER>', MASTER_PASSWORD='<REPL_PASS>', MASTER_LOG_FILE='<LOG_FROM_xtrabackup_binlog_info', MASTER_LOG_POS=<POSITION_FROM_xtrabackup_binlog_info>; START SLAVE;" [/code]

Script

Below you will find the simple script that I came up with to get you started on automating this task to quickly rebuild a slave in your own environment. The script was created with a lot of assumptions that you may not have in your environment. Please make sure to update accordingly.
  • The script is running from a separate server that has access to both MySQL servers. May work by running on one of the MySQL servers but not tested to do so.
  • The account running the commands on the remote servers have SUDO access to run commands.
  • SSH key of the remote server is set up to allow ssh access to both MySQL servers allowing for ssh with no password prompt.
  • .my.cnf is configured in the home directory of the account being used to SSH on the MySQL servers allowing the script to run MySQL and Xtrabackup commands with no password prompt.
  • The following software is installed on the MySQL servers : netcat (nc), pigz, xtrabackup.
  • Firewall rules are open for the port being used by NETCAT streaming.
  • All my testing was on Debian servers. Found with other OS's and Netcac versions there is a -d flag for running nc in the background. In Debian you have to use -p for netcat when telling it to listen on a port.
Script [code language="bash"] #!/bin/bash SOURCE_SERVER=$1 TARGET_SERVER=$2 # This is if you have /var/lib/mysql on the / (Root) drive. You can change this to '/' to capture the size of the drive to get an idea of how long is left on your backup MOUNT_CONTAINING_DATADIR='/var' #Should match the number of CPU's on your server BACKUP_THREADS=2 # Seconds to wait in the loop to check that the backup has completed TIMER=5 # Amount of memory to use to apply the logs to the newly backed up server MEMORY_GB_USED_APPLY_LOGS='1G' # Change this to a 1 if you want to configure the target a slave of the source server CONFIGURE_TARGET_AS_SLAVE_OF_SOURCE=1 GTID_STATUS='' SSH='ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null' NETCAT_PORT=2112 REPL_USER='repl' REPL_PASS='repl_PASSWORD' MYSQL_DATADIR='/var/lib/mysql' echo "" echo "Starting MySQL Streaming Slave Rebuild Script" echo "" if [ ! $1 ];then echo "SOURCE Server not set. Please run like :" echo "" echo "$0 <source_server> <target_server>" echo "" exit 1 fi # VALIDATE EXECUTION echo "Are you sure you wish to perform this rebuild. THIS IS DESTRUCTIVE!!!" echo " SOURCE SERVER (Performing Backup) : $SOURCE_SERVER" echo " TARGET SERVER (Receiving Backup) : $TARGET_SERVER" echo " All files in $MYSQL_DATADIR on $TARGET_SERVER will be DELETED!!!" echo -n "START NOW ? (y/n) : " read CONFIRM if [ $CONFIRM == "y" -o $CONFIRM == "Y" ]; then echo "" echo "STARTING REBUILD!!!" else echo "Y or y was not chosen. Exiting." exit 1 fi # PREPARE TARGET echo " Stopping MySQL" $SSH $TARGET_SERVER "service mysql stop" echo " Clearing $MYSQL_DATADIR" $SSH $TARGET_SERVER "rm -rf $MYSQL_DATADIR/*" # PERFORM STREAMING BACKUP echo " Start Listener on Target server $TARGET_SERVER:$NETCAT_PORT to accept the backup and place it in $MYSQL_DATADIR" $SSH $TARGET_SERVER "nc -l -p $NETCAT_PORT | unpigz -c | xbstream -x -C $MYSQL_DATADIR" > /dev/null 2>&1 & echo " Starting backup on source server $SOURCE_SERVER:$NETCAT_PORT to stream backup" $SSH $SOURCE_SERVER "innobackupex --stream=xbstream --parallel=$BACKUP_THREADS /tmp | pigz -c --fast | nc -w 2 $TARGET_SERVER $NETCAT_PORT" > /tmp/backup_log 2>&1 & sleep 4 echo " Watching backup every $TIMER seconds to validate when the backup is complete" LOOP=1 while [ 1 -eq $LOOP ];do BACKUP_PROCESSES=`$SSH $SOURCE_SERVER "ps aux | grep -v grep | grep -w innobackupex | wc -l"` if [ $BACKUP_PROCESSES -eq 0 ]; then echo " Backup has COMPLETED!!" LOOP=2 else echo " Backup is Running!" sleep $TIMER fi done # PREPARE AND COMPLETE BACKUP ON TARGET echo " Applying logs to the Xtrabackup" $SSH $TARGET_SERVER "innobackupex --use-memory=$MEMORY_GB_USED_APPLY_LOGS --apply-log $MYSQL_DATADIR" > /tmp/backup_log 2>&1 & sleep 3 LOOP=1 while [ 1 -eq $LOOP ];do APPLY_PROCESSES=`$SSH $TARGET_SERVER "ps aux | grep -v grep | grep innobackupex | wc -l"` if [ $APPLY_PROCESSES -eq 0 ]; then echo " Apply logs has COMPLETED!!" LOOP=2 else echo " Apply Logs Running!" sleep $TIMER fi done sleep 1 echo " Updating ownership on the backup files so that MySQL owns them" $SSH $TARGET_SERVER "chown -R mysql:mysql $MYSQL_DATADIR" echo " Starting MySQL" $SSH $TARGET_SERVER "service mysql start" if [ $CONFIGURE_TARGET_AS_SLAVE_OF_SOURCE -eq 1 ]; then echo " Configuring Replication" GTID_STATUS=`$SSH $SOURCE_SERVER "mysql -BN -e \"SHOW VARIABLES LIKE 'gtid_mode'\"" | grep -w ON | wc -l` if [ $GTID_STATUS -gt 0 ]; then echo "Found GTID ON. Using Master Auto Position. SLAVE STARTED" GTID_POS=`$SSH $TARGET_SERVER "cat $MYSQL_DATADIR/xtrabackup_binlog_info" | awk '{print $3}' | head -n 1 | sed 's/,//'` $SSH $TARGET_SERVER "mysql -e \"SET GLOBAL gtid_purged='$GTID_POS';\"" $SSH $TARGET_SERVER "mysql -e \"CHANGE MASTER TO MASTER_HOST='$SOURCE_SERVER', MASTER_USER='$REPL_USER', MASTER_PASSWORD='$REPL_PASS', MASTER_AUTO_POSITION = 1; START SLAVE; \"" else echo "Found GTID not ON. Grabbing positioning from the backup file and using that to configure replication" MASTER_LOG=`$SSH $TARGET_SERVER "cat $MYSQL_DATADIR/xtrabackup_binlog_info" | awk '{print $1}'` MASTER_POS=`$SSH $TARGET_SERVER "cat $MYSQL_DATADIR/xtrabackup_binlog_info" | awk '{print $2}'` echo "Setting the slave to $MASTER_LOG and $MASTER_POS. SLAVE STARTED" $SSH $TARGET_SERVER "mysql -e \"CHANGE MASTER TO MASTER_HOST='$SOURCE_SERVER', MASTER_USER='$REPL_USER', MASTER_PASSWORD='$REPL_PASS', MASTER_LOG_FILE='$MASTER_LOG', MASTER_LOG_POS=$MASTER_POS; START SLAVE;\"" fi fi [/code] Script output [code language="bash"] [root@bastion DB]# ./mysql_file_streaming_slave_rebuild.sh 10.10.10.198 10.10.10.197 Starting MySQL Streaming Slave Rebuild Script Are you sure you wish to perform this rebuild. THIS IS DESTRUCTIVE!!! SOURCE SERVER (Performing Backup) : 10.10.10.198 TARGET SERVER (Receiving Backup) : 10.10.10.197 All files in /var/lib/mysql on 10.10.10.197 will be DELETED!!! START NOW ? (y/n) : y STARTING REBUILD!!! Stopping MySQL Clearing /var/lib/mysql Start Listener on Target server 10.10.10.197:2112 to accept the backup and place it in /var/lib/mysql Starting backup on source server 10.10.10.198:2112 to stream backup Watching backup every 5 seconds to validate when the backup is complete Backup is Running! Backup is Running! Backup is Running! Backup has COMPLETED!! Applying logs to the Xtrabackup Apply Logs Running! Apply Logs Running! Apply logs has COMPLETED!! Updating ownership on the backup files so that MySQL owns them Starting MySQL Configuring Replication Found GTID ON. Using Master Auto Position. SLAVE STARTED [/code]  

Conclusion

I have found that this has greatly increased the timeframe in which it took me to recover a failed slave. By transferring the data only once from the source server to the target server, and with the data being compressed during the transfer, I feel this is one of the most efficient methods of recovering a failed slave, or building a new one.

No Comments Yet

Let us know what you think

Subscribe by email