Pythian Blog: Technical Track

MySQL Back to Basics: Analyze, Check, Optimize, and Repair

It felt like the right time for us to look back at some useful commands for table maintenance that some of us may not have mastered as much as we might like to think.

In my post about gathering index statistics, I referred to OPTIMIZE TABLE, ANALYZE TABLE, and REPAIR TABLE — but I never explained in depth what the different commands do, and what the differences between them are. That is what I thought I would do with this post, focusing on InnoDB and MyISAM, and the differences in how they treat those commands. I will also look at different cases and see which one is right for in each case.

In this article, I will be looking at are online tools*, rather than the other (very useful) tools that are offline–such as myisamchk or setting innodb_force_recovery and starting up the server–both of which can sometimes save our skins. I’ll cover offline methods in another post, as they are outside of the scope for this post.

ANALYZE TABLE

For InnoDB, this is a fast operation — although not exactly the most trustworthy statistics come from it. The reason, is that with InnoDB, when you execute ANALYZE TABLE randomTable, in order to update the index cardinality, there will be 10 random dives into each index, fetching an estimate cardinality and presenting that to you and the optimizer. Hence, several ANALYZE TABLEs in a row can (and probably will) produce different results each time. Be aware however, that this causes a write lock on the table.

With MyISAM, this operation may take longer, as it will scan the index and populate it accordingly. It only places a read lock while doing so, but on the plus side, the statistics gathered is trustworthy and the information you get is correct.

One of the reasons to issue ANALYZE TABLE is that when we have a lot of modification of a table (INSERT or DELETE for example), over time the optimizer might not make the best choices when trying to decide if it should use a specific index or not. By analyzing the table, we help it to make a more educated choice.

If we are in a replicated setup, and don’t want the ANALYZE to spread from the master, use ANALYZE NO_WRITE_TO_BINLOG TABLE randomTable, and that will stop it from being replicated. (Or do:

SET SQL_LOG_BIN=0;
ANALYZE TABLE randomTable;
SET SQL_LOG_BIN=1;

which will temporarily disable logging to binary log for your session.)

CHECK TABLE

A command that works with InnoDB, ARCHIVE, and MyISAM, CHECK TABLE does what it sounds like — it check a table for errors or other issues.

CHECK TABLE also checks if a table is compatible with the current version after an upgrade (CHECK TABLE randomTable FOR UPGRADE), and if it’s found not to be compatible, a full check is done and then the .frm file is updated with the current version number.

With MyISAM, you have the option of specifying different levels of checking. (You can specify it with InnoDB as well, but it is just ignored.) The different levels are:

  • EXTENDED
  • MEDIUM
  • CHANGED
  • FAST
  • QUICK

If you do not specify any options, CHECK TABLE uses the MEDIUM option, which checks the key checksums and also that all links are valid.

The output might look like this:

sql01 sakila> CHECK TABLE film_actor EXTENDED;
+-------------------+-------+----------+----------+
| Table             | Op    | Msg_type | Msg_text |
+-------------------+-------+----------+----------+
| sakila.film_actor | check | status   | OK       |
+-------------------+-------+----------+----------+
1 row in set (0.09 sec)

Or like this:

sql01 sakila> CHECK TABLE rental_rep \G
*************************** 1. row ***************************
   Table: sakila.rental_rep
      Op: check
Msg_type: warning
Msg_text: 1 client is using or hasn't closed the table properly
*************************** 2. row ***************************
   Table: sakila.rental_rep
      Op: check
Msg_type: warning
Msg_text: Size of indexfile is: 26052608      Should be: 25295872
*************************** 3. row ***************************
   Table: sakila.rental_rep
      Op: check
Msg_type: warning
Msg_text: Size of datafile is: 20836352       Should be: 20185030
*************************** 4. row ***************************
   Table: sakila.rental_rep
      Op: check
Msg_type: error
Msg_text: Key in wrong position at page 15993856
*************************** 5. row ***************************
   Table: sakila.rental_rep
      Op: check
Msg_type: error
Msg_text: Corrupt
5 rows in set (0.14 sec)

EXTENDED, checks all keys for all rows. This is a very slow operation.

QUICK doesn’t check the rows for invalid links.

FAST only checks tables that wasn’t closed in the right way.

CHANGED
checks for tables that wern’t closed properly or that have been changed since the last time you ran a check.

When running your cron jobs to check if a table is okay, the preferred method of doing so is FAST, as it is quick and doesn’t interrupt normal operations too much, while it still catches the major problems. Then perhaps ANALYZE TABLE biweekly/monthly, to keep them up to date (if you have a lot of changing data — for a static table, this is not needed).

OPTIMIZE TABLE

Just as with CHECK TABLE, the compatibility with various storage engines isn’t the best for OPTIMIZE, but let’s see why, and what it means, after we’ve taken a look at its usage.

OPTIMIZE TABLE is used to remove overhead, sort indexes for better access, and generally keep your tables in good order — but is often overrated and used more often than necessary. I’ve seen a lot of people using OPTIMIZE TABLE daily or even hourly, but that is generally not needed, unless you do massive amount of INSERT and/or DELETE on a few tables, and even then daily or weekly will be enough — just remember to run it only on the affected tables.

Just like ANALYZE TABLE, you can specify NO_WRITE_TO_BINLOG to avoid logging the statement and having it spread to your slaves if you are using a replication setup.

First, the limitations.

OPTIMIZE TABLE will not sort R-tree indexes unless you are using version >= 5.0.34 OR >= 5.1.15, in which this issue has been fixed.

With InnoDB, OPTIMIZE yet again maps to an ALTER TABLE statement, which means that there will be a table rebuild, which in turn updates the indexes (causing, obviously, a lock on the table).

For MyISAM, it starts by locking the table, then sorts the index pages and updates the statistics (if they’re not already up to date).

However, if you try to optimize a crashed table:

sql01 sakila> OPTIMIZE TABLE rental_rep \G
*************************** 1. row ***************************
   Table: sakila.rental_rep
      Op: optimize
Msg_type: error
Msg_text: Table './sakila/rental_rep' is marked as crashed and should be repaired
1 row in set, 2 warnings (0.00 sec)

REPAIR

So, let’s skip ahead and look at how to repair this table:

sql01 sakila> REPAIR TABLE rental_rep \G
*************************** 1. row ***************************
   Table: sakila.rental_rep
      Op: repair
Msg_type: warning
Msg_text: Number of rows changed from 651130 to 672140
*************************** 2. row ***************************
   Table: sakila.rental_rep
      Op: repair
Msg_type: status
Msg_text: OK
2 rows in set (8.52 sec)

So we just solved the problem (I faked a server crash by killing the mysqld process while inserting data to the table). But let’s do something else to this table, before I get into explaining how REPAIR works:

%rm var/sakila/rental_rep.MYI
%ls var/sakila/rental_rep.*
var/sakila/rental_rep.MYD       var/sakila/rental_rep.frm
%

Ouch! We don’t have the index data there any longer. What does the server say?

sql01 sakila> CHECK TABLE rental_rep \G
*************************** 1. row ***************************
   Table: sakila.rental_rep
      Op: check
Msg_type: error
Msg_text: Can't find file: 'rental_rep' (errno: 2)
1 row in set (0.00 sec) 

sql01 sakila> SELECT * FROM rental_rep LIMIT 1;
ERROR 1017 (HY000): Can't find file: 'rental_rep' (errno: 2)
sql01 sakila> SHOW CREATE TABLE rental_rep;
ERROR 1017 (HY000): Can't find file: 'rental_rep' (errno: 2)

sql01 sakila> REPAIR TABLE rental_rep \G
*************************** 1. row ***************************
   Table: sakila.rental_rep
      Op: repair
Msg_type: error
Msg_text: Can't find file: 'rental_rep' (errno: 2)
1 row in set, 1 warning (0.00 sec)

That looks pretty bad, no? errno: 2 means “No such file or directory” which we know, since we deleted the file — but can we solve this without restoring from a backup?

Yes! There is a specific usage for this with REPAIR TABLE:

sql01 sakila> REPAIR TABLE rental_rep USE_FRM \G
*************************** 1. row ***************************
   Table: sakila.rental_rep
      Op: repair
Msg_type: warning
Msg_text: Number of rows changed from 0 to 672140
*************************** 2. row ***************************
   Table: sakila.rental_rep
      Op: repair
Msg_type: status
Msg_text: OK
2 rows in set (8.17 sec) 

sql01 sakila> CHECK TABLE rental_rep;
+-------------------+-------+----------+----------+
| Table             | Op    | Msg_type | Msg_text |
+-------------------+-------+----------+----------+
| sakila.rental_rep | check | status   | OK       |
+-------------------+-------+----------+----------+
1 row in set (0.80 sec)

sql01 sakila> SELECT * FROM rental_rep LIMIT 1 \G
*************************** 1. row ***************************
   rental_id: 1
 rental_date: 2005-05-24 23:04:41
inventory_id: 2452
 customer_id: 333
 return_date: 2005-06-03 01:43:41
    staff_id: 2
 last_update: 2006-02-15 21:30:53
1 row in set (0.00 sec)

So we have just successfully restored the index data, and made the table usable again, without restoring from a backup. Pretty neat! Let me explain what just happened, and how REPAIR works
REPAIR TABLE tries to repair a corrupted or broken table, and it can also be used as REPAIR NO_WRITE_TO_BINLOG TABLE randomTable to disable writing to the binary log and replicate the statement.

REPAIR TABLE works for MyISAM and ARCHIVE, but with some caveats. For example, there is a risk of data loss if something happens during the repair.

Besides USE_FRM, REPAIR has two other options that can be used. The first, EXTENDED, basically means that the indexes are repaired row by row, rather than creating one index at a time with sorting.

The other option, QUICK, means that REPAIR will only try to repair the index rather than index and data.

So, what about USE_FRM which I used above? Well, it’s a bit of a double-edged sword, one that might save you from a lot of headaches, but which might also give you even more trouble if you are not careful.

If you create table rental_rep with version 5.0.27, upgrade to 5.0.51a for example, and then someone removes your MYI file, and you run the above command, you risk losing all data in the table, because of the difference in MySQL version that the table was created with. However, if you are running version >= 5.0.62 (currently Enterprise only), there will instead be an error such as Failed repairing incompatible .FRM file, making sure you do not make a mistake.

However, if you do successfully import your data after the version upgrade, and perform a vanilla REPAIR TABLE <table> (or CHECK TABLE <table> FOR UPGRADE), the .frm will be upgraded to the current version, and potentially solve your problem.

With an ARCHIVE table, REPAIR TABLE might also improve the compression, depending on how much data addition there has been.

* (Well, as online as it gets — table locking is still done.)

No Comments Yet

Let us know what you think

Subscribe by email