Pythian Blog: Technical Track

The night of the living files

Scary movies for teenagers and database administration tasks share some common traits. Usually, the movie starts with a group of teens making unwise choices. Maybe you, as a DBA, are no longer a teenager, but you shouldn't underestimate your ability to do the same.
$ sudo su -
 Darrera entrada: ds jun 2 14:43:53 UTC 2018 des de a a pts/0
 [root@mysql ~]# cd /var/lib/mysql
 [root@mysql mysql]# df -h .
 Filesystem Size Used Avail Use% Mounted on
 /dev/mapper/VolGroup00-LogVol00 38G 35G 3G 92% /
 [root@mysql mysql]#
 
Back to the movie…today is just another regular day. Checking Jira (or whatever your ticketing system is), answering emails and phone calls. Maybe a couple of meetings. And in the meantime, you will perform some critical tasks on the most important systems for your company. Just the usual stuff!
[root@mysql mysql]# ls -l *.log
 ls: cannot access *.log: No such file or directory
 [root@mysql mysql]# ls -l *log*
 -rw-r-----. 1 mysql mysql 50331648 Jun 2 15:07 ib_logfile0
 -rw-r-----. 1 mysql mysql 50331648 Jun 2 14:53 ib_logfile1
 [root@mysql mysql]# rm -f ib*
 [root@mysql mysql]#
 

There's a shadow behind you

Right now you are not able to remember why… and it doesn't matter anymore... but you had to perform a few tasks in a staging environment, release some storage by removing files, but you did it in production. You (I mean YOU) removed critical files from the production database. Suddenly, you perceive the damage you have caused. You become aware of the full scale of what you've done. You feel like you're going to die.
[root@mysql mysql]# ls -l ib*
 ls: cannot access ib*: No such file or directory
 [root@mysql mysql]#
 

Don't go into the light, Carol Anne

While you are starting to feel really sick, you realize that you should be getting thousands of pages, and a few of the screens near you should display a bunch of red blinking alerts. And a few people should be asking you about what's happening. But--now, everything looks fine. Again, in teen horror movies, there is a moment when our heroes have a sense of the real danger ahead, but this danger is ignored by the other people.
[root@mysql mysql]# ps -ef | grep mysql | grep -v grep
 mysql 2983 1 0 10:16 ? 00:00:00 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid
 [root@mysql mysql]#
 [root@mysql mysql]# mysql -p
 Enter password:
 Welcome to the MySQL monitor. Commands end with ; or \g.
 Your MySQL connection id is 5
 Server version: 5.7.22-22 Percona Server (GPL), Release 22, Revision f62d93c
 Copyright (c) 2009-2018 Percona LLC and/or its affiliates
 Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
 Oracle is a registered trademark of Oracle Corporation and/or its
 affiliates. Other names may be trademarks of their respective
 owners.
 
 Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
 mysql> show processlist;
 +----+------+-----------+------+---------+------+----------+------------------+-----------+---------------+
 | Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
 +----+------+-----------+------+---------+------+----------+------------------+-----------+---------------+
 REDACTED TO PROTECT THE INNOCENT
 +----+------+-----------+------+---------+------+----------+------------------+-----------+---------------+
 XXX rows in set (0.00 sec)
 
 mysql>
 
You feel like… maybe I failed to remove the files. But then you check and the files are missing. You did it all right, you removed the files. The database should be shaking and trembling before dying in pain. But it is not.

BYOO (Bring your own Ouija)

You breathe deeply and remember that there's no such a thing as removing a file. What you did is remove metadata information from the filesystem, but as long as the file is open by a process, this process is able to access it. Read and write. Nobody else can… but the process still keeps a link to that file, a link that keeps this file between us and the hell of removed files. You must keep the database processes running, and by doing so you ensure that the removed files are available. But you know that this is impossible. You can't keep your database up and running forever. So now what? Think, think, think!

It's easy, blah, blah, blah and you are dead

Another horror picture cliché occurs when one of the characters says that he knows how to fix the problem, both quickly and easily. In our movie, the easy and quick fix (and wrong fix) is to perform a logical dump of the database. Unfortunately, a dump of several TB databases will take some time… and then you have to restore the dump back. A long downtime is not the best option. And to make things worse, after restoring the backup, you need to apply the binary logs to bring the database up to date.

Run, just run!

While you look at your mates being devoured by the mysqldump zombies, you understand that a better option is required if you want to return home safely. The lsof command (aka List Open Files) can help you understand the problem.
[root@mysql mysql]# lsof -c mysql | grep /var/lib/mysql/ib
 mysqld 2983 mysql 3uW REG 253,0 50331648 34323185 /var/lib/mysql/ib_logfile0 (deleted)
 mysqld 2983 mysql 9uW REG 253,0 50331648 34323186 /var/lib/mysql/ib_logfile1 (deleted)
 mysqld 2983 mysql 10uW REG 253,0 12582912 34323184 /var/lib/mysql/ibdata1 (deleted)
 mysqld 2983 mysql 12uW REG 253,0 12582912 34323188 /var/lib/mysql/ibtmp1 (deleted)
Here you can see the missing files with The Mark of Cain: Deleted

In /proc we trust

The proc filesystem is an illusionary filesystem that presents information about processes (and other system information). Does the proc filesystem give information about open files for a process? Yes, it does. But not only this. In the proc filesystem, you can access any of the files that a process has currently open. Remember that 2983 is the pid of mysqld.
[root@mysql mysql]# ls -l /proc/2983/fd | grep "lib/mysql/ib"
 lrwx------. 1 mysql mysql 64 Jun 28 10:27 10 -> /var/lib/mysql/ibdata1 (deleted)
 lrwx------. 1 mysql mysql 64 Jun 28 10:27 12 -> /var/lib/mysql/ibtmp1 (deleted)
 lrwx------. 1 mysql mysql 64 Jun 28 10:27 3 -> /var/lib/mysql/ib_logfile0 (deleted)
 lrwx------. 1 mysql mysql 64 Jun 28 10:27 9 -> /var/lib/mysql/ib_logfile1 (deleted)
Yes, you can access any file that a process has open, even if that file is no longer there. In the directory /proc/<pid>/fd, you have a bunch of soft links to the original files, even if these files were removed. The links and the files are available as long as the process doesn't close them.
"MySQL 5.7.22-2[root@mysql mysql]# head -c 31 /proc/2983/fd/3
 "MySQL 5.7.22-22[root@mysql mysql]#
 [root@mysql mysql]#

Between a rock and a hard link

Wait a minute… we have soft links to removed files, and we can access those files. It would be great if we could create a hard link based on a soft link. You know that the ln command is able to create both soft links and hard links. You check the main page for ln and discover the option -L that dereferences soft links. You execute a quick test to make sure that ln -L does what you think it should do… and it does!
[root@mysql mysql]# cd /tmp
 [root@mysql tmp]# touch test_file
 [root@mysql tmp]# ln -s test_file test_file_link
 [root@mysql tmp]# ls -l test_file*
 -rw-r--r--. 1 root root 0 Jun 2 15:32 test_file
 lrwxrwxrwx. 1 root root 9 Jun 2 15:33 test_file_link -> test_file
 [root@mysql tmp]# echo AreYouThere > test_file
 [root@mysql tmp]# ln -L test_file_link test_file_hard
 [root@mysql tmp]# ls -l test*
 -rw-r--r--. 2 root root 12 Jun 2 15:33 test_file
 -rw-r--r--. 2 root root 12 Jun 2 15:33 test_file_hard
 lrwxrwxrwx. 1 root root 9 Jun 2 15:33 test_file_link -> test_file
 [root@mysql tmp]# cat test_file_hard
 AreYouThere
 [root@mysql tmp]# rm test_file
 rm: remove regular file ‘test_file'? y
 [root@mysql tmp]# cat test_file_hard
 AreYouThere
 [root@mysql tmp]# cat test_file_link
 cat: test_file_link: No such file or directory
 [root@mysql tmp]#

Easy Peasy Lemon Squeezy

Now it is time to apply our fix. We will create hard links from the soft links.
[root@mysql tmp]# ln -L /proc/2983/fd/3 /var/lib/mysql/ib_logfile0
 ln: failed to create hard link ‘/var/lib/mysql/ib_logfile0' => ‘/proc/2983/fd/3': No such file or directory
 [root@mysql tmp]# ls -l /proc/2983/fd/3
 lrwx------. 1 mysql mysql 64 Jun 2 15:22 /proc/2983/fd/3 -> /var/lib/mysql/ib_logfile0 (deleted)
The file is there, but we can't create the hard link! Why? There are two reasons that explain why you can't create the hard link. The first one is that ln -L tries to retrieve the filesystem information from the source of the soft link in /proc/2983/fd/3. This is /var/lib/mysql/ib_logfile0, we all know that it has been deleted. But, if the destination is deleted, why we can see the contents of that file in /proc? Well, because, as we've seen before, /proc is an illusionary filesystem. The part of illusionary explains why we can see the contents of that file even if has been deleted. The part of the filesystem that gives us another reason not to be able to create a hard link from a file "stored" in /proc… you can't create hard links between different filesystems. It was not so easy… but there is still hope.

No downtime no pain

We know that we can view, open and copy the deleted files contents using the magic of /proc, but databases require that all the files stay in a consistent state. It is not only that we need to copy the files, we need to make sure that the files are not modified during the copy, after the copy and before the database re-opens the files. Even if you find a way to keep the consistent state between the /proc files and the "new" ones, usually the only way to get the database to re-open a file is to restart the database (I know there are certain types of files that are re-opened… but let's keep it this way for simplicity). Both keeping some database files untouched and restarting the database will require downtime. Keeping the files untouched means that we have to lock them for writes, to make sure we keep the consistency. And we need to restart the database to reopen these files.

Some of us have to die

Now that we know that downtime is unavoidable, we have to discuss with the organization and plan for the downtime… but we don't have a long time to do this. A database crash could make the deleted files disappear. But while we discuss the downtime… there is still one thing missing: how are we going to keep the consistency of the files during the copy? We have two options: find a way to keep the files untouched during the copy and the database shutdown, or just shut down the database. Ok, ok, ok... You told us that the files will disappear if we stop the database. And now you are telling us to stop the database? Are you crazy? No, I am not crazy. As we already know, if a file is open then it is not removed by the operating system. We also know that there is a place called /proc where we can re-open any file already opened by any other process. The only thing we need to do is keep an instance of the files in /proc, then we stop the database, perform the copy of files and start the database again. And how do we can keep an instance of the files open? Well, it's up to you... choose your favorite application or command that can open a file and keep it open. Just make sure that the file you are interested in is really open. For example, vim is a bad choice as it opens the file at the beginning, but doesn't keep it open. The command I use is tail with the -f flag to keep the file open. And make sure you don't close/kill it... or the file will disappear. (lsof should help you make sure the files are still open)
[root@mysql ~]# cd /proc/2983/fd
 [root@mysql fd]# nohup tail -f 10 > /dev/null &
 [2] 3112
 [root@mysql fd]# nohup: ignoring input and redirecting stderr to stdout
 [root@mysql fd]# nohup tail -f 12 > /dev/null &
 [3] 3113
 [root@mysql fd]# nohup: ignoring input and redirecting stderr to stdout
 [root@mysql fd]# nohup tail -f 3 > /dev/null &
 [4] 3114
 [root@mysql fd]# nohup: ignoring input and redirecting stderr to stdout
 [root@mysql fd]# nohup tail -f 9 > /dev/null &
 [5] 3115
 [root@mysql fd]# nohup: ignoring input and redirecting stderr to stdout

Oops! Where are my files?

Once we're sure that all the files we want to recover are open by another process, we may proceed to stop the database.
[root@mysql fd]# service mysql stop
 Redirecting to /bin/systemctl stop mysql.service
 [root@mysql fd]# ls -l
 total 0
"Oh come on... you told us that we were going to keep our files available, but they are gone! You are a big liar!" Yes, if you check the /proc directory, as we stopped mysql, all the symbolic links to files opened by mysqld are gone. But remember... a file disappears when ALL the processes are over. We have four processes that are keeping the files open, but these processes have different pids, so we need to find the identifiers and then we will be able to copy the files. In our case, the identifiers for the processes are 3112, 3113, 3114 and 3115.
[root@mysql fd]# ls -l /proc/3112/fd
 total 0
 lrwx------. 1 root root 64 Jun 28 10:48 0 -> /dev/pts/1
 l-wx------. 1 root root 64 Jun 28 10:48 1 -> /dev/null
 lrwx------. 1 root root 64 Jun 28 10:48 2 -> /dev/pts/1
 lr-x------. 1 root root 64 Jun 28 10:48 3 -> /var/lib/mysql/ibdata1 (deleted)
 [root@mysql fd]# ls -l /proc/3113/fd
 total 0
 lrwx------. 1 root root 64 Jun 28 10:48 0 -> /dev/pts/1
 l-wx------. 1 root root 64 Jun 28 10:48 1 -> /dev/null
 lrwx------. 1 root root 64 Jun 28 10:48 2 -> /dev/pts/1
 lr-x------. 1 root root 64 Jun 28 10:48 3 -> /var/lib/mysql/ibtmp1 (deleted)
 [root@mysql fd]# ls -l /proc/3114/fd
 total 0
 lrwx------. 1 root root 64 Jun 28 10:48 0 -> /dev/pts/1
 l-wx------. 1 root root 64 Jun 28 10:48 1 -> /dev/null
 lrwx------. 1 root root 64 Jun 28 10:48 2 -> /dev/pts/1
 lr-x------. 1 root root 64 Jun 28 10:48 3 -> /var/lib/mysql/ib_logfile0 (deleted)
 [root@mysql fd]# ls -l /proc/3115/fd
 total 0
 lrwx------. 1 root root 64 Jun 28 10:48 0 -> /dev/pts/1
 l-wx------. 1 root root 64 Jun 28 10:48 1 -> /dev/null
 lrwx------. 1 root root 64 Jun 28 10:48 2 -> /dev/pts/1
 lr-x------. 1 root root 64 Jun 28 10:48 3 -> /var/lib/mysql/ib_logfile1 (deleted)

It's alive!

Now it is time to restore the files to the original locations.
[root@mysql fd]# cp /proc/3112/fd/3 /var/lib/mysql/ibdata1
 [root@mysql fd]# cp /proc/3113/fd/3 /var/lib/mysql/ibtmp1
 [root@mysql fd]# cp /proc/3114/fd/3 /var/lib/mysql/ib_logfile0
 [root@mysql fd]# cp /proc/3115/fd/3 /var/lib/mysql/ib_logfile1
Restore ownership.
[root@mysql fd]# chown mysql:mysql /var/lib/mysql/ib*
 [root@mysql fd]#
 
And restart the database.
[root@mysql fd]# service mysql start
 Redirecting to /bin/systemctl start mysql.service
 [root@mysql ~]# mysql -p
 Enter password:
 Welcome to the MySQL monitor. Commands end with ; or \g.
 Your MySQL connection id is 3
 Server version: 5.7.22-22 Percona Server (GPL), Release 22, Revision f62d93c
 
 Copyright (c) 2009-2018 Percona LLC and/or its affiliates
 Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
 
 Oracle is a registered trademark of Oracle Corporation and/or its
 affiliates. Other names may be trademarks of their respective
 owners.
 
 Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
 mysql> select "Look! It's moving. It's alive. It's alive... It'alive, it's moving, it's alive, it's alive, it's alive, it's alive, IT'S ALIVE!";
 +---------------------------------------------------------------------------------------------------------------------------------+
 | Look! It's moving. It's alive. It's alive... It'alive, it's moving, it's alive, it's alive, it's alive, it's alive, IT'S ALIVE! |
 +---------------------------------------------------------------------------------------------------------------------------------+
 | Look! It's moving. It's alive. It's alive... It'alive, it's moving, it's alive, it's alive, it's alive, it's alive, IT'S ALIVE! |
 +---------------------------------------------------------------------------------------------------------------------------------+
 1 row in set (0.00 sec)
 
 mysql> exit
 
The database is up and running!

The sun is shining brightly again… or not!

The approach we've seen today to recover files removed by mistake has some caveats. Actually, we're not recovering removed files, we are copying removed files and this means that our filesystem must be able to store both groups of files (removed and copies) at the same time. And even if we have enough storage, when the removed files are large, this copy can take a long time to complete. Also, depending on the file type, it is possible that MySQL closes the removed files due to internal exhausted file caches. So, you need to move really fast to recover the files. The first thing you must do is execute processes that keep the removed files open (as many tail -f as removed files you have). Also, keep monitoring the MySQL error log all the time during the recovery. There are additional strategies to recover the situation. For example, locking the files to avoid changes and flushing dirty pages, then restore the files and restart, as seen in this post. Look also at the comment from Ivgeni Segal in the previous link that presents an interesting method: create a slave from a server with removed files by using symbolic links and xtrabackup and then promote the slave to master. This can be a great option when you have large files and a reduced downtime is necessary. In that case, this is probably the best approach. Other strategies (DISCLAIMER: this is just an idea, I never tested it) could involve performing an xtrabackup while copying the missing files and then apply the logs to have a consistent image. I've also seen people monitor actively for missing files. I don't think this is a good option, because you are not protecting your database; rather, you are detecting the disaster once it has happened. If you don't trust yourself and your fellow DBAs, a better option is to run a process that checks all the files and creates hard links in a different location. Then you only need to verify if one of these files is missing, and if so, use a new hard link to re-link it back to the original location. But be careful with this approach, any online change could increase the storage requirements or you could be hard-linking an old version of the file. But there are so many things that can go wrong, so many different ways to remove critical data, that I would recommend you just three things:
  1. Keep your critical databases as small as possible. You don't want to have your important database down just because you needed to spend a day recovering several TBs of old email logs and historical data.
  2. Do not plan your backups. Plan your recovery. Use several backup strategies and have at least three backups: physical, logical and binary logs. Physical is usually the fastest if you need to perform a full recovery. Logical is great if you need to recover one table or even some rows from one table. And binary logs are the only way you can perform a point in time recovery. Last, but not least: slaves are good for availability, not for recovery.
  3. Use FIT-ACER. There are so many things that can go wrong, that it is almost impossible to plan for each and every one. But you can do something better, work to avoid things going wrong as we do every day in Pythian.

No Comments Yet

Let us know what you think

Subscribe by email