Pythian Blog: Technical Track

How to fix the skip scan bug in 18c

A recent Twitter thread had an interesting test case of the optimizer choosing sub-optimal skip scan over range scan. I've checked what's going on, and it happens to be a known bug. Here's a slightly simplified test case: [sourcecode lang="sql" gutter="false" collapse="true"] drop table as_big_lookup_table cascade constraints purge; create table as_big_lookup_table as select rownum pk_col, mod(rownum,30) type_col, mod(rownum,5000) join_col, rpad('x',30,'x') padding_col from dual connect by rownum <=10000 / create index as_big_lookup_table_good on as_big_lookup_table (type_col, join_col); create index as_big_lookup_table_bad on as_big_lookup_table (pk_col, type_col, join_col); explain plan for select pk_col, join_col from as_big_lookup_table where type_col = 10 and join_col = 10; select * from table(dbms_xplan.display(format=>'basic +rows +cost +predicate')); [/sourcecode] We have a table with two similar indexes. One is a perfect match for the table access by (type_col, join_col) columns, and another is quite bad as the leading column is (essentially) primary key. But when we query the table by those two columns, for some reason Optimizer chooses to skip scan the bad index. What is going on? Let's compare the three plans:
  1. Bad plan in 18c
  2. Good plan in 18c - forced with an index_rs_asc hint
  3. Bad plan in 12.1.0.2 - forced with an index_ss hint
[sourcecode gutter="false" collapse="true"] ------------------------------------------------------------------------- | Id | Operation | Name | Rows | Cost (%CPU)| ------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 2 (0)| |* 1 | INDEX SKIP SCAN | AS_BIG_LOOKUP_TABLE_BAD | 1 | 2 (0)| ------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("TYPE_COL"=10 AND "JOIN_COL"=10) filter("JOIN_COL"=10 AND "TYPE_COL"=10) -- --------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Cost (%CPU)| --------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 2 (0)| | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| AS_BIG_LOOKUP_TABLE | 1 | 2 (0)| |* 2 | INDEX RANGE SCAN | AS_BIG_LOOKUP_TABLE_GOOD | 1 | 1 (0)| --------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("TYPE_COL"=10 AND "JOIN_COL"=10) -- ------------------------------------------------------------------------- | Id | Operation | Name | Rows | Cost (%CPU)| ------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 32 (0)| |* 1 | INDEX SKIP SCAN | AS_BIG_LOOKUP_TABLE_BAD | 1 | 32 (0)| ------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("TYPE_COL"=10 AND "JOIN_COL"=10) filter("JOIN_COL"=10 AND "TYPE_COL"=10) [/sourcecode] Comparing the costs of the first two plans, we see that they are identical, and CBO preferred the bad one. I thought it is done so due to alphabetical order, but no, it doesn't seem so. Comparing the costs of the first and third plans, we see that the skip scan cost is more reasonable in 12.1.0.2. It is close to the number of leaf blocks of the bad index. Since we have a reproducible case, we can now compare 10053 parts of the first and third plans, and see if there's an explanation for bad costing in 18c. [sourcecode gutter="false" collapse="true"] -- 18.3 ****** Costing Index AS_BIG_LOOKUP_TABLE_BAD SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SKIP_SCAN ColGroup Usage:: PredCnt: 2 Matches Full: #2 Partial: Sel: 1.0000e-04 Estimated selectivity: 0.033333 , col: #2 Estimated selectivity: 2.0000e-04 , col: #3 Access Path: index (skip-scan) SS scan sel: 1.0000e-04 SS filter sel: 1.0000e-04 ANDV (#skips): 1.000000 SS io: 1.000000 vs. table scan io: 21.000000 Skip Scan chosen Access Path: index (SkipScan) Index: AS_BIG_LOOKUP_TABLE_BAD resc_io: 2.000000 resc_cpu: 14443 ix_sel: 1.0000e-04 ix_sel_with_filters: 1.0000e-04 Cost: 2.000370 Resp: 2.000370 Degree: 1 -- 12.1.0.2 ****** Costing Index AS_BIG_LOOKUP_TABLE_BAD SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SKIP_SCAN Access Path: index (skip-scan) SS scan sel: 0.033400 SS filter sel: 0.033400 ANDV (#skips): 10000.000000 SS io: 31.000000 vs. table scan io: 21.000000 Skip Scan chosen Access Path: index (SkipScan) Index: AS_BIG_LOOKUP_TABLE_BAD resc_io: 32.000000 resc_cpu: 294686 ix_sel: 0.033400 ix_sel_with_filters: 0.033400 Cost: 32.029034 Resp: 32.029034 Degree: 1 Best:: AccessPath: IndexRange Index: AS_BIG_LOOKUP_TABLE_BAD Cost: 32.029034 Degree: 1 Resp: 32.029034 Card: 334.000000 Bytes: 0.000000 [/sourcecode] Without knowing what those numbers mean, we can see the difference in the #skips - the number of times CBO estimates skip scan to do a range scan. In 18.3 it is 1 for some reason, and that error makes the bad index look just like the proper two-column index. It also seems like this is the result of SS selectivity miscalculation: 0.0001 vs 0.033400. Is there a way to guess which bug is responsible for the skip scan costing change? Usually yes, and v$session_fix_control allows you to do that. I have a simple script called bug.sql to search it by bug descriptions, release or bug number. In this case, searching for "skip scan" shows a few hits: [sourcecode gutter="false" collapse="true"] SQL> @bug "skip scan" Opt BUGNO VALUE SQL_FEATURE DESCRIPTION features ---------- ----- ---------------------------------------- ---------------------------------------------------------------- ---------- 6070954 1 QKSFM_ACCESS_PATH_6070954 No skip scan with contiguous leading equality index keys 10.2.0.4 5714944 1 QKSFM_ACCESS_PATH_5714944 set IO cost for index skip scan to at least 1.0 10.2.0.5 7272039 1 QKSFM_ACCESS_PATH_7272039 use index cost adj when comparing skip scan with full table scan 10.2.0.5 6086930 1 QKSFM_ACCESS_PATH_6086930 correct skip scan selectivity evaluation for BETWEEN predicate 11.2.0.2 7277732 1 QKSFM_CBO_7277732 allow skip scan costing for NL with non-join predicate 11.2.0.2 8855396 1 QKSFM_ACCESS_PATH_8855396 sanity check for skip scan costing 11.2.0.2 8893626 1 QKSFM_ACCESS_PATH_8893626 apply index filter selectivity during skip scan costing 11.2.0.2 9195582 1 QKSFM_ACCESS_PATH_9195582 leaf blocks as upper limit for skip scan blocks 11.2.0.2 9227576 1 QKSFM_CBO_9227576 allow skip scan costing for semi/anti-join 11.2.0.3 10080014 1 QKSFM_CBO_10080014 allow skip scan costing for PRIOR join in CONNECT BY query 11.2.0.3 12839247 1 QKSFM_ACCESS_PATH_12839247 improve cost estimate for skip scan 11.2.0.4 13362020 1 QKSFM_CARDINALITY_13362020 fix selectivity for skip scan filter with multi column stats 11.2.0.4 14107333 1 QKSFM_INDEX_SS_14107333 improve cost estimate for skip scan due to last col 11.2.0.4 14254052 1 QKSFM_CARDINALITY_14254052 amend accounting for nulls in skip scan selectivity calculation 11.2.0.4 14254795 1 QKSFM_CARDINALITY_14254795 Consider scan and filter selectivity separately for skip scan 11.2.0.4 16555865 1 QKSFM_INDEX_SS_16555865 ignore filter selectivity on skip scan key columns 12.1.0.2 20129763 1 QKSFM_INDEX_SS_20129763 update number of skips for skip scan only when index keys remain 12.2.0.1 20587527 1 QKSFM_INDEX_SS_20587527 improve cost estimate for skip scan 12.2.0.1 20107874 1 QKSFM_INDEX_SS_20107874 adjust number of skips by current skip scan selectivity 18.1.0 4904838 1 QKSFM_CBO_4904838 allow index skip scan with no index keys 9.2.0.8 [/sourcecode] And testing if bug 20107874 is responsible for the change in skip scan costing gives a positive result, with the good index being picked up. [sourcecode lang="sql" gutter="false"] SQL> alter session set "_fix_control"='20107874:off'; -- --------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Cost (%CPU)| --------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 2 (0)| | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| AS_BIG_LOOKUP_TABLE | 1 | 2 (0)| |* 2 | INDEX RANGE SCAN | AS_BIG_LOOKUP_TABLE_GOOD | 1 | 1 (0)| --------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("TYPE_COL"=10 AND "JOIN_COL"=10) [/sourcecode] Sometimes you don't even need to look at the plans, and do this type of simple test straight after identifying that something is not right with some feature and/or release. Mauro Pagano has written a tool called Pathfinder which can do this for you automatically, brute forcing all possible optimizer environments. Is there anything else to add? Yes. Two things: 1. This is a known issue and there's a patch available on MOS. 2. As you can see, the test case doesn't use dbms_stats call. That's okay in 12c+ as table statistics is created with CTAS, and index stats is created on index creation by default since 10g. But let's try to gather table stats just in case in 18c (after reverting fix control back to original setting): [sourcecode lang="sql"] exec dbms_stats.gather_table_stats('', 'AS_BIG_LOOKUP_TABLE') --------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Cost (%CPU)| --------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 2 (0)| | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| AS_BIG_LOOKUP_TABLE | 1 | 2 (0)| |* 2 | INDEX RANGE SCAN | AS_BIG_LOOKUP_TABLE_GOOD | 1 | 1 (0)| --------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("TYPE_COL"=10 AND "JOIN_COL"=10) [/sourcecode] What?! Everything is great again, and skip scan is gone! This is really surprising to see. I tried to find if there is any difference in table/index stats after DBMS_STATS call, and couldn't find any; all statistics were exactly the same. I don't know how to explain this, and will be happy to hear an explanation.

No Comments Yet

Let us know what you think

Subscribe by email