Pythian Blog: Technical Track

Backup and data streaming with xbstream, tar, socat, and netcat

On April 4th 2012 Xtrabackup 2.0 was released in to GA by Percona along with a new streaming feature called xbstream. This new tool allowed for compression and parallelism of streaming backups when running xtrabackup or innobackupex without having to stream using tar, then pipe to gzip or pigz, then pipe to netcat or socat to stream your backup to the recipient server. This resulted in simplifying the command structure a great deal and fast became the preferred way of streaming backups from a origin server to its destination. In recent months we’ve had discussions internally as to whether xbstream would be a better way of streaming large amounts of data between servers for use cases outside of xtrabackup. And which is better, socat or netcat? So I decided to put this to the test. In order to test this I created two m5.xlarge EC2 instances as this provided an “up to 10 gigabit” level of network performance. I also put both instances in the same availability zone in order to reduce the chance of poor networking skewing my results. Once this was done I installed Percona XtraDB Server 5.6, Xtrabackup 2.4.9, and created a simple database with a data set size of 90Gb. For my first test I started by using a streaming backup of the entire data set using both the xbstream and tar streaming methods. Compression was not used so to evaluate the streaming methods equally. Both socat and netcat were evaluated. XBSTREAM / NETCAT TESTS
[root@ip-172-31-54-219 ~]# time innobackupex --stream=xbstream --parallel=1 ./ | nc 172.31.55.250 10001
 171228 15:11:13 innobackupex: Starting the backup operation
 .....
 xtrabackup: Transaction log of lsn (119373249297) to (119373249297) was copied.
 171228 15:25:22 completed OK!
 real 14m9.385s
 user 3m27.392s
 sys 3m34.420s
 
 [root@ip-172-31-54-219 ~]# time innobackupex --stream=xbstream --parallel=2 ./ | nc 172.31.55.250 10001
 171228 15:38:50 innobackupex: Starting the backup operation
 .....
 xtrabackup: Transaction log of lsn (119373249297) to (119373249297) was copied.
 171228 15:50:42 completed OK!
 real 11m51.915s
 user 3m31.808s
 sys 3m34.740s
 
 [root@ip-172-31-54-219 ~]# time innobackupex --stream=xbstream --parallel=4 ./ | nc 172.31.55.250 10001
 171228 15:38:50 innobackupex: Starting the backup operation
 .....
 xtrabackup: Transaction log of lsn (119373249297) to (119373249297) was copied.
 171228 16:07:28 completed OK!
 real 11m51.923s
 user 3m27.836s
 sys 3m30.088s
XBSTREAM / SOCAT TESTS
[root@ip-172-31-54-219 ~]# time innobackupex --stream=xbstream --parallel=1 ./ | socat -u stdio TCP:172.31.55.250:10001
 171228 16:13:51 innobackupex: Starting the backup operation
 .......
 xtrabackup: Transaction log of lsn (119373249297) to (119373249297) was copied.
 171228 16:26:55 completed OK!
 real 13m3.911s
 user 3m8.208s
 sys 2m35.160s
 
 [root@ip-172-31-54-219 ~]# time innobackupex --stream=xbstream --parallel=2 ./ | socat -u stdio TCP:172.31.55.250:10001
 171228 16:28:16 innobackupex: Starting the backup operation
 .....
 xtrabackup: Transaction log of lsn (119373249297) to (119373249297) was copied.
 171228 16:40:08 completed OK!
 real 11m51.984s
 user 3m8.148s
 sys 2m28.860s
 
 [root@ip-172-31-54-219 ~]# time innobackupex --stream=xbstream --parallel=4 ./ | socat -u stdio TCP:172.31.55.250:10001
 171228 16:44:54 innobackupex: Starting the backup operation
 .......
 xtrabackup: Transaction log of lsn (119373249297) to (119373249297) was copied.
 171228 16:56:46 completed OK!
 real 11m51.916s
 user 3m7.460s
 sys 2m24.968s
TAR / NETCAT TEST
[root@ip-172-31-54-219 ~]# time innobackupex --stream=tar --parallel=1 ./ | nc 172.31.55.250 10001
 171228 17:02:26 innobackupex: Starting the backup operation
 .......
 xtrabackup: Transaction log of lsn (119373249297) to (119373249297) was copied.
 171228 17:16:09 completed OK!
 real 13m42.910s
 user 3m19.696s
 sys 3m47.672s
TAR / SOCAT TEST
[root@ip-172-31-54-219 ~]# time innobackupex --stream=tar --parallel=1 ./ | socat -u stdio TCP:172.31.55.250:10001
 171228 17:19:59 innobackupex: Starting the backup operation
 ......
 xtrabackup: Transaction log of lsn (119373249297) to (119373249297) was copied.
 171228 17:33:03 completed OK!
 real 13m3.940s
 user 2m59.468s
 sys 2m29.388s

Here is a summary of the output noted above, in seconds.

  You’ll notice that the xbstream method outperformed the tar method once we started introducing parallel threads. You may also note that performance gains ended after 2 threads were in use and this is likely due to the fact we may have hit a networking bottleneck. Another interesting thing to note is that with a single thread, socat outperformed netcat, but when it came to using multiple threads, they were about equal. So what does this mean for moving data outside of xtrabackup / innobackupex? For my next test I decided to focus on just the large data files that I created in the test schema directory, the main reason being that xbstream can handle files, but not directories and cannot act recursively. First I used xbstream and then tried again using tar. Again, compression was not used so we could look at just the streaming method. Both netcat and socat were evaluated XBSTREAM / NETCAT TESTS
[root@ip-172-31-54-219 streamtest]# time xbstream -c -p 1 ./t* | nc 172.31.55.250 10001
 real 12m25.439s
 user 0m20.928s
 sys 3m43.492s
 
 [root@ip-172-31-54-219 streamtest]# time xbstream -c -p 2 ./t* | nc 172.31.55.250 10001
 real 12m28.086s
 user 0m22.996s
 sys 3m50.972s
 
 [root@ip-172-31-54-219 streamtest]# time xbstream -c -p 4 ./t* | nc 172.31.55.250 10001
 real 13m15.775s
 user 0m21.460s
 sys 3m50.336s
XBSTREAM / SOCAT TESTS
[root@ip-172-31-54-219 streamtest]# time xbstream -c -p 1 ./t* | socat -u stdio TCP:172.31.55.250:10001
 real 11m47.781s
 user 0m17.132s
 sys 2m38.168s
 
 [root@ip-172-31-54-219 streamtest]# time xbstream -c -p 2 ./t* | socat -u stdio TCP:172.31.55.250:10001
 real 11m47.707s
 user 0m15.816s
 sys 2m22.884s
 
 [root@ip-172-31-54-219 streamtest]# time xbstream -c -p 4 ./t* | socat -u stdio TCP:172.31.55.250:10001
 real 11m47.805s
 user 0m16.796s
 sys 2m36.588s
TAR / NETCAT TEST
[root@ip-172-31-54-219 streamtest]# time tar -cf - ./t* | nc 172.31.55.250 10001
 real 11m47.942s
 user 0m5.260s
 sys 2m32.048s
TAR / SOCAT TEST
[root@ip-172-31-54-219 streamtest]# time tar -cf - ./t* | socat -u stdio TCP:172.31.55.250:10001
 real 11m47.914s
 user 0m4.860s
 sys 1m37.632s

Here is a summary of the output noted above, in seconds.

In this test we can see that almost all the methods worked equally well, with the only less efficient process being xbstream / netcat combination. Keep in mind that the changing of parallel threads with the xbstream -p option didn’t really seem to have an effect because xbstream will not leverage parallel threads on its own. It needs to be working with another tool like xtrabackup that will be able to take advantage of the parallelism.

CONCLUSION

When working with xtrabackup / innobackupex, it looks like xbstream and socat is the way to go. If you’re steaming backups and are not taking advantage of multiple threads, you should consider it. For large data copies from one server to another. It looks like you’re safe using xbstream or tar, so long as the combination of xbsteam and netcat is avoided. Considering that xbstream will not work with directories or act recursively natively, it may just be easier to stick with tar.

No Comments Yet

Let us know what you think

Subscribe by email