Pythian Blog: Technical Track

Performance problems with Dynamic Statistics in Oracle 12c

I’ve been making some tests recently with the new Oracle 12.1.0.2 In-Memory option and have been faced with an unexpected  performance problem.  Here is a test case:

create table tst_1 as
with q as (select 1 from dual connect by level <= 100000)
select rownum id, 12345 val, mod(rownum,1000) ref_id  from q,q
where rownum <= 200000000;
Table created.
create table tst_2 as select rownum ref_id, lpad(rownum,10, 'a') name, rownum || 'a' name2</pre>
from dual connect by level <= 1000;
Table created.
begin
dbms_stats.gather_table_stats(
ownname          => user,
tabname          =>'TST_1',
method_opt       => 'for all columns size 1',
degree => 8
);
dbms_stats.gather_table_stats(
ownname          => user,
tabname          =>'TST_2',
method_opt       => 'for all columns size 1'
);
end;
/
PL/SQL procedure successfully completed.
alter table tst_1 inmemory;
Table altered.
select count(*) from tst_1;
COUNT(*)
----------
200000000

Waiting for in-memory segment population:

select segment_name, bytes, inmemory_size from v$im_segments;
SEGMENT_NAME         BYTES INMEMORY_SIZE
--------------- ---------- -------------
TST_1           4629463040    3533963264

Now let’s make a simple two table join:

select name, sum(val) from tst_1 a, tst_2 b where a.ref_id = b.ref_id and name2='50a'
group by name;
Elapsed: 00:00:00.17

Query runs pretty fast. Execution plan has the brand new vector transformation

Execution Plan
----------------------------------------------------------
Plan hash value: 213128033
--------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                          |     1 |    54 |  7756  (21)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION        |                          |       |       |            |          |
|   2 |   LOAD AS SELECT                  | SYS_TEMP_0FD9D66FA_57B2B |       |       |            |          |
|   3 |    VECTOR GROUP BY                |                          |     1 |    24 |     5  (20)| 00:00:01 |
|   4 |     KEY VECTOR CREATE BUFFERED    | :KV0000                  |     1 |    24 |     5  (20)| 00:00:01 |
|*  5 |      TABLE ACCESS FULL            | TST_2                    |     1 |    20 |     4   (0)| 00:00:01 |
|   6 |   HASH GROUP BY                   |                          |     1 |    54 |  7751  (21)| 00:00:01 |
|*  7 |    HASH JOIN                      |                          |     1 |    54 |  7750  (21)| 00:00:01 |
|   8 |     VIEW                          | VW_VT_377C5901           |     1 |    30 |  7748  (21)| 00:00:01 |
|   9 |      VECTOR GROUP BY              |                          |     1 |    13 |  7748  (21)| 00:00:01 |
|  10 |       HASH GROUP BY               |                          |     1 |    13 |  7748  (21)| 00:00:01 |
|  11 |        KEY VECTOR USE             | :KV0000                  |   200K|  2539K|  7748  (21)| 00:00:01 |
|* 12 |         TABLE ACCESS INMEMORY FULL| TST_1                    |   200M|  1716M|  7697  (21)| 00:00:01 |
|  13 |     TABLE ACCESS FULL             | SYS_TEMP_0FD9D66FA_57B2B |     1 |    24 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   5 - filter("NAME2"='50a')
   7 - access("ITEM_5"=INTERNAL_FUNCTION("C0") AND "ITEM_6"="C2")
  12 - inmemory(SYS_OP_KEY_VECTOR_FILTER("A"."REF_ID",:KV0000))
       filter(SYS_OP_KEY_VECTOR_FILTER("A"."REF_ID",:KV0000))
Note
-----
   - vector transformation used for this statement

After having such impressive performance I’ve decided to run the query in parallel:

select /*+ parallel(8) */ name, sum(val) from tst_1 a, tst_2 b
where a.ref_id = b.ref_id and name2='50a'
group by name;
Elapsed: 00:01:02.55

Query elapsed time suddenly dropped from 0.17 seconds to the almost 1 minute and 3 seconds. But the second execution runs in 0.6 seconds.
The new plan is:

Execution Plan
----------------------------------------------------------
Plan hash value: 3623951262
-----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |          |     1 |    29 |  1143  (26)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR                     |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)               | :TQ10001 |     1 |    29 |  1143  (26)| 00:00:01 |  Q1,01 | P->S | QC (RAND)  |
|   3 |    HASH GROUP BY                    |          |     1 |    29 |  1143  (26)| 00:00:01 |  Q1,01 | PCWP |            |
|   4 |     PX RECEIVE                      |          |     1 |    29 |  1143  (26)| 00:00:01 |  Q1,01 | PCWP |            |
|   5 |      PX SEND HASH                   | :TQ10000 |     1 |    29 |  1143  (26)| 00:00:01 |  Q1,00 | P->P | HASH       |
|   6 |       HASH GROUP BY                 |          |     1 |    29 |  1143  (26)| 00:00:01 |  Q1,00 | PCWP |            |
|*  7 |        HASH JOIN                    |          |   200K|  5664K|  1142  (26)| 00:00:01 |  Q1,00 | PCWP |            |
|   8 |         JOIN FILTER CREATE          | :BF0000  |     1 |    20 |     2   (0)| 00:00:01 |  Q1,00 | PCWP |            |
|*  9 |          TABLE ACCESS FULL          | TST_2    |     1 |    20 |     2   (0)| 00:00:01 |  Q1,00 | PCWP |            |
|  10 |         JOIN FILTER USE             | :BF0000  |   200M|  1716M|  1069  (21)| 00:00:01 |  Q1,00 | PCWP |            |
|  11 |          PX BLOCK ITERATOR          |          |   200M|  1716M|  1069  (21)| 00:00:01 |  Q1,00 | PCWC |            |
|* 12 |           TABLE ACCESS INMEMORY FULL| TST_1    |   200M|  1716M|  1069  (21)| 00:00:01 |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   7 - access("A"."REF_ID"="B"."REF_ID")
   9 - filter("NAME2"='50a')
  12 - inmemory(SYS_OP_BLOOM_FILTER(:BF0000,"A"."REF_ID"))
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"A"."REF_ID"))
Note
-----
   - dynamic statistics used: dynamic sampling (level=AUTO)
   - Degree of Parallelism is 8 because of hint

We can see a Bloom filter instead of key vector, but this is not the issue. Problem is coming from the “dynamic statistics used: dynamic sampling (level=AUTO)” note.
In 10046 trace file I’ve found nine dynamic sampling queries and one of them was this one:

SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring
  optimizer_features_enable(default) no_parallel result_cache(snapshot=3600)
  */ SUM(C1)
FROM
 (SELECT /*+ qb_name("innerQuery")  */ 1 AS C1 FROM (SELECT /*+
  NO_VECTOR_TRANSFORM ORDERED */ "A"."VAL" "ITEM_1","A"."REF_ID" "ITEM_2"
  FROM "TST_1" "A") "VW_VTN_377C5901#0", (SELECT /*+ NO_VECTOR_TRANSFORM
  ORDERED */ "B"."NAME" "ITEM_3","B"."REF_ID" "ITEM_4" FROM "TST_2" "B" WHERE
  "B"."NAME2"='50a') "VW_VTN_EE607F02#1" WHERE ("VW_VTN_377C5901#0"."ITEM_2"=
  "VW_VTN_EE607F02#1"."ITEM_4")) innerQuery
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1     43.92      76.33          0          5          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3     43.92      76.33          0          5          0           0
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 64     (recursive depth: 1)
Number of plan statistics captured: 1
Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  RESULT CACHE  56bn7fg7qvrrw1w8cmanyn3mxr (cr=0 pr=0 pw=0 time=0 us)
         0          0          0   SORT AGGREGATE (cr=0 pr=0 pw=0 time=8 us)
         0          0          0    HASH JOIN  (cr=0 pr=0 pw=0 time=4 us cost=159242 size=2600000 card=200000)
 200000000  200000000  200000000     TABLE ACCESS INMEMORY FULL TST_1 (cr=3 pr=0 pw=0 time=53944537 us cost=7132 size=800000000 card=200000000)
         0          0          0     TABLE ACCESS FULL TST_2 (cr=0 pr=0 pw=0 time=3 us cost=4 size=9 card=1)
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  asynch descriptor resize                        1        0.00          0.00
  Disk file operations I/O                        1        0.00          0.00
  CSS initialization                              1        0.00          0.00
  CSS operation: action                           1        0.00          0.00
  direct path write temp                       6267        0.02         30.37
********************************************************************************

Vector transformation is disabled, inefficient table order is fixed by the ORDERING hint and we are waiting for hash table creation based on huge TST_1 table.
Dynamic statistics feature has been greatly improved in Oracle 12c  with the support for joins and group by predicates. This is why we have such join during the parse time. Next document has the”Dynamic Statistics (previously known as dynamic sampling)” section inside: Understanding Optimizer Statistics with Oracle Database 12c where the new functionality is described.

Let’s make a simpler test:

select /*+ parallel(2) */ ref_id, sum(val) from tst_1 a group by ref_id;
Execution Plan
----------------------------------------------------------
Plan hash value: 2527371111
---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |          |  1000 |  9000 |  7949  (58)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR                   |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)             | :TQ10001 |  1000 |  9000 |  7949  (58)| 00:00:01 |  Q1,01 | P->S | QC (RAND)  |
|   3 |    HASH GROUP BY                  |          |  1000 |  9000 |  7949  (58)| 00:00:01 |  Q1,01 | PCWP |            |
|   4 |     PX RECEIVE                    |          |  1000 |  9000 |  7949  (58)| 00:00:01 |  Q1,01 | PCWP |            |
|   5 |      PX SEND HASH                 | :TQ10000 |  1000 |  9000 |  7949  (58)| 00:00:01 |  Q1,00 | P->P | HASH       |
|   6 |       HASH GROUP BY               |          |  1000 |  9000 |  7949  (58)| 00:00:01 |  Q1,00 | PCWP |            |
|   7 |        PX BLOCK ITERATOR          |          |   200M|  1716M|  4276  (21)| 00:00:01 |  Q1,00 | PCWC |            |
|   8 |         TABLE ACCESS INMEMORY FULL| TST_1    |   200M|  1716M|  4276  (21)| 00:00:01 |  Q1,00 | PCWP |            |
---------------------------------------------------------------------------------------------------------------------------
Note
-----
   - dynamic statistics used: dynamic sampling (level=AUTO)
   - Degree of Parallelism is 2 because of hint

We can see a “dynamic statistics used” note again. It’s a simple query without predicates with the single table with pretty accurate statistics. From my point of view, here is no reason for dynamic sampling at all.
Automatic dynamic sampling was introduced in 11G Release 2. Description of this feature can be found in this document: Dynamic sampling and its impact on the Optimizer.
“From Oracle Database 11g Release 2 onwards the optimizer will automatically decide if dynamic sampling will be useful and what dynamic sampling level will be used for SQL statements executed in parallel. This decision is based on size of the tables in the statement and the complexity of the predicates”.
Looks like algorithm has been changed in 12c and dynamic sampling is triggered in a broader set of use cases.
This behavior can be disabled at statement, session or system level using the fix control for the bug 7452863. For example,
ALTER SESSION SET “_fix_control”=’7452863:0′;

Summary

Dynamic statistics has been enhanced in Oracle 12c, but this can lead to a longer parse time.
Automatic dynamic statistics is used more often in 12c which can lead to a parse time increase in the more cases than before.

No Comments Yet

Let us know what you think

Subscribe by email