Pythian Blog: Technical Track

MySQL: Tuning filesorts and temporary tables

Goal:

Getting rid of filesorts and temporary tables by tuning MySQL queries.

Background:

Filesorts and temp tables are a necessary evil in MySQL, used when MySQL must sort the data before returning the output to the user. They are the most common issue with slow queries in MySQL, the main reason being that if the output is too large, you can kiss goodbye in-memory performance, and say hello to disk access.

Common User and Manager symptoms:

  1. Sore throats due to excessive swearing at poor database performance.
  2. Sore hip pockets due to lack of scalability requiring continuous hardware purchases.
  3. Sore users who are sitting on high speed bandwidth and have to wait more than 1.5 secs for a response from any web page. Google has raised the bar, time to ditch the straddle technique and go for the Fosbury Flop.

Appropriate Medicine:

The quickest way to get rid of temp tables and filesorts, is to have already done the sorting work for MySQL in the form of an appropriate index.

Caveat:

Think of good indexes as an investment in improving read performance at the expense of write performance. Let your database usage guide what indexes to add. Please reread that last sentence — it is important.

What is the appropriate index?

An appropriate index is one that allows MySQL to use that index to sort instead of sorting on the fly.

Looking at the columns in the GROUP BY and ORDER BY part of the SQL is a good start. Testing in an appropriate–sized test environment is also a good idea. The order of the columns in a combined column index also matters.

The gospel according to MySQL is contained in the ORDER BY optimization and LIMIT optimization documents. As my teammate Raj Thukral mentioned, the 2nd paragraph and examples in the ORDER BY doc provide key guidelines in getting MySQL to use your new index.

The Bottom Line:

Using an appropriate covering index scales; as the dataset grows, any filesorts and temporary tables are not going to scale. See the difference in timings below. With an index, the timing remains the same.

Case Study:

Here is an example using MySQL 5.0 on my PC (Pentium 4 2.6 Ghz, 1.5Gig RAM).

The dataset is a freely available dataset of movie ratings.

mysql> desc user_ratings;
+---------------+--------------+------+-----+---------+-------+
| Field         | Type         | Null | Key | Default | Extra |
+---------------+--------------+------+-----+---------+-------+
| userid        | int(11)      | YES  |     | NULL    |       |
| num_ratings   | bigint(21)   | NO   |     | 0       |       |
| avg_rating    | double(17,4) | YES  |     | NULL    |       |
| stddev_rating | double(17,4) | YES  |     | NULL    |       |
| Rating1       | double(17,0) | YES  |     | NULL    |       |
| Rating2       | double(17,0) | YES  |     | NULL    |       |
| Rating3       | double(17,0) | YES  |     | NULL    |       |
| Rating4       | double(17,0) | YES  |     | NULL    |       |
| Rating5       | double(17,0) | YES  |     | NULL    |       |
+---------------+--------------+------+-----+---------+-------+
9 rows in set (0.05 sec)
mysql> select userid,num_ratings from user_ratings order by num_ratings desc limit 10;
+--------+-------------+
| userid | num_ratings |
+--------+-------------+
|    405 |         737 |
|    655 |         685 |
|     13 |         636 |
|    450 |         540 |
|    276 |         518 |
|    416 |         493 |
|    537 |         490 |
|    303 |         484 |
|    234 |         480 |
|    393 |         448 |
+--------+-------------+
10 rows in set (0.05 sec)
mysql> explain select userid,num_ratings from user_ratings order by num_ratings desc limit 10;
+----+-------------+--------------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table        | type | possible_keys | key  | key_len | ref  | rows | Extra          |
+----+-------------+--------------+------+---------------+------+---------+------+------+----------------+
|  1 | SIMPLE      | user_ratings | ALL  | NULL          | NULL | NULL    | NULL | 1009 | Using filesort |
+----+-------------+--------------+------+---------------+------+---------+------+------+----------------+

1 row in set (0.00 sec)
mysql> show index from user_ratings;
Empty set (0.00 sec)
mysql> create index ix_movie1_urate_numrate on user_ratings(num_ratings);
Query OK, 943 rows affected (0.08 sec)
Records: 943  Duplicates: 0  Warnings: 0
mysql> explain select userid,num_ratings from user_ratings order by num_ratings desc limit 10;
+----+-------------+--------------+-------+---------------+-------------------------+---------+------+------+-------+
| id | select_type | table        | type  | possible_keys | key                     | key_len | ref  | rows | Extra |
+----+-------------+--------------+-------+---------------+-------------------------+---------+------+------+-------+
|  1 | SIMPLE      | user_ratings | index | NULL          | ix_movie1_urate_numrate | 8       | NULL | 1037 |       |
+----+-------------+--------------+-------+---------------+-------------------------+---------+------+------+-------+

1 row in set (0.00 sec)
mysql> select userid,num_ratings from user_ratings order by num_ratings desc limit 10;
+--------+-------------+
| userid | num_ratings |
+--------+-------------+
|    405 |         737 |
|    655 |         685 |
|     13 |         636 |
|    450 |         540 |
|    276 |         518 |
|    416 |         493 |
|    537 |         490 |a
|    303 |         484 |
|    234 |         480 |
|    393 |         448 |
+--------+-------------+
10 rows in set (0.00 sec)

On a slightly larger dataset… the movie rating dataset from Netflix, available as part of the $1 million challenge.

mysql> desc user_rating_stats;
+---------------+--------------+------+-----+---------+-------+
| Field         | Type         | Null | Key | Default | Extra |
+---------------+--------------+------+-----+---------+-------+
| userid        | mediumint(9) | YES  | MUL | NULL    |       |
| num_ratings   | int(11)      | YES  |     | NULL    |       |
| avg_rating    | double       | YES  |     | NULL    |       |
| stddev_rating | double       | YES  |     | NULL    |       |
| Rating1       | int(11)      | YES  |     | NULL    |       |
| Rating2       | int(11)      | YES  |     | NULL    |       |
| Rating3       | int(11)      | YES  |     | NULL    |       |
| Rating4       | int(11)      | YES  |     | NULL    |       |
| Rating5       | int(11)      | YES  |     | NULL    |       |
+---------------+--------------+------+-----+---------+-------+
9 rows in set (0.03 sec)
mysql> show index from user_rating_stats;
+-------------------+------------+----------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table             | Non_unique | Key_name                         | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------------------+------------+----------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| user_rating_stats |          1 | ix_netflix_user_rate_stat_userid |            1 | userid      | A         |      480189 |   NULL   | NULL   | YES  | BTREE      |         |
+-------------------+------------+----------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
1 row in set (0.00 sec)
mysql> select userid,num_ratings from user_rating_stats order by num_ratings desc limit 10;
+---------+-------------+
| userid  | num_ratings |
+---------+-------------+
|  305344 |       17653 |
|  387418 |       17436 |
| 2439493 |       16565 |
| 1664010 |       15813 |
| 2118461 |       14831 |
| 1461435 |        9822 |
| 1639792 |        9767 |
| 1314869 |        9740 |
| 2606799 |        9064 |
| 1932594 |        8880 |
+---------+-------------+
10 rows in set (1.67 sec)
mysql> create index ix_netflix_uratestat_numrate on user_rating_stats(num_ratings);
Query OK, 480189 rows affected (3.22 sec)
Records: 480189  Duplicates: 0  Warnings: 0
mysql> explain select userid,num_ratings from user_rating_stats order by num_ratings desc limit 10;
+----+-------------+-------------------+-------+---------------+------------------------------+---------+------+--------+-------+
| id | select_type | table             | type  | possible_keys | key                          | key_len | ref  | rows   | Extra |
+----+-------------+-------------------+-------+---------------+------------------------------+---------+------+--------+-------+
|  1 | SIMPLE      | user_rating_stats | index | NULL          | ix_netflix_uratestat_numrate | 5       | NULL | 480189 |       |
+----+-------------+-------------------+-------+---------------+------------------------------+---------+------+--------+-------+

1 row in set (0.01 sec)
mysql> select userid,num_ratings from user_rating_stats order by num_ratings desc limit 10;
+---------+-------------+
| userid  | num_ratings |
+---------+-------------+
|  305344 |       17653 |
|  387418 |       17436 |
| 2439493 |       16565 |
| 1664010 |       15813 |
| 2118461 |       14831 |
| 1461435 |        9822 |
| 1639792 |        9767 |
| 1314869 |        9740 |
| 2606799 |        9064 |
| 1932594 |        8880 |
+---------+-------------+
10 rows in set (0.00 sec)

Have Fun.

–Paul

No Comments Yet

Let us know what you think

Subscribe by email