Pythian Blog: Technical Track

MySQL GROUP BY and DISTINCT Oddity

At the end of my last MySQL post I mentioned strange behavior with GROUP BY and DISTINCT. In Oracle such constructions are not valid and produce an error.

Here is the test table and statements:

create table t1 (c1 int, c2 int);

insert into t1 values (1,10);
insert into t1 values (2,20);
insert into t1 values (3,30);
insert into t1 values (4,11);
insert into t1 values (4,40);
insert into t1 values (5,50);
insert into t1 values (5,12);

select c1,count(*) cnt from t1 group by c1 order by c2;

select distinct c1 from t1 order by c2;


The following two statements generate errors in Oracle for an obvious reason. In the first statement, ORDER BY clause can contain only GROUP BY expressions or aggregate functions so that the values are deterministic for each grouped row.

SQL> select c1,count(*) cnt from t1 group by c1 order by c2;
select c1,count(*) cnt from t1 group by c1 order by c2
                                                    *
ERROR at line 1:
ORA-00979: not a GROUP BY expression

DISTINCT can be considered as a special case of GROUP BY (SELECT DISTINCT c1 from t1 is the same as SELECT c1 from t1 GROUP BY c1) and the same restrictions apply so GROUP BY clause of SELECT DISTINCT can only include expressions or columns already selected.

SQL> select distinct c1 from t1 order by c2;
select distinct c1 from t1 order by c2
                                    *
ERROR at line 1:
ORA-01791: not a SELECTed expression

Unlike Oracle, MySQL allows such combinations. Let’s see how our first statement is working in MySQL. Where in the sort output should MySQL put rows with column c1 4 and 5? Possible locations are between 1 and 2 or after 3.

mysql> select c1,count(*) cnt from t1 group by c1 order by c2;
+------+-----+
| c1   | cnt |
+------+-----+
|    1 |   1 |
|    4 |   2 |
|    2 |   1 |
|    3 |   1 |
|    5 |   2 |
+------+-----+
5 rows in set (0.03 sec)

It turned out that MySQL engine takes for с2 column the first value it comes across. It’s easy to see by adding c2 in select list:

mysql> select c1,count(*) cnt, c2 from t1 group by c1 order by c2;
+------+-----+------+
| c1   | cnt | c2   |
+------+-----+------+
|    1 |   1 |   10 |
|    4 |   2 |   11 |
|    2 |   1 |   20 |
|    3 |   1 |   30 |
|    5 |   2 |   50 |
+------+-----+------+
5 rows in set (0.00 sec)

I remember a while ago I used Microsoft Access and there was an aggregate function first() returning the first value it comes across. It looks that MySQL implicitly does the same to the columns not in GROUP BY list. However, there is no such function in the list of MySQL aggregates.

Statement with DISTINCT behaves similarly.

mysql> select distinct c1 from t1 order by c2;
+------+
| c1   |
+------+
|    1 |
|    4 |
|    2 |
|    3 |
|    5 |
+------+
5 rows in set (0.00 sec)

This MySQL “feature” could save some resources on sorts and aggregates but generally I would avoid it as this is not portable solution. Generally speaking, query output is non-deterministic — it depends on the full table scan implementation and on the physical order of rows in a table. This means that it’s actually a bug and , instead, MySQL should produce and error on those statements.

No Comments Yet

Let us know what you think

Subscribe by email