Pythian Blog: Technical Track

In-Memory Column Store: 10046 may be lying to you!

The Oracle In-Memory Column Store (IMC) is a new database option available to Oracle Database Enterprise Edition (EE) customers. It introduces a new memory area housed in your SGA, which makes use of the new compression functionality brought by the Oracle Exadata platform, as well as the new column oriented data storage vs the traditional row oriented storage. Note: you don’t need to be running on Exadata to be able to use the IMC!


Part I – How does it work?

In this part we’ll take a peek under the hood of the IMC and check out some of its internal mechanics.

Let’s create a sample table which we will use for our demonstration:

create table test inmemory priority high
as
select a.object_name as name, rownum as rn,
sysdate + rownum / 10000 as dt
from all_objects a, (select rownum from dual connect by level <= 500)
/

Almost immediately upon creating this table, the w00? processes will wake up from sleeping on the event ‘Space Manager: slave idle wait’ and start their analysis to check out the new table. By the way, the sleep times for this event are between 3 and 5 seconds, so it’s normal if you experience a little bit of a delay.

The process who picked it up will then create a new entry in the new dictionary table compression$, such as this one:

exec pt('select ts#,file#,block#,obj#,dataobj#,ulevel,sublevel,ilevel,flags,bestsortcol, tinsize,ctinsize,toutsize,cmpsize,uncmpsize,mtime,spare1,spare2,spare3,spare4 from compression$');
TS# : 4
FILE# : 4
BLOCK# : 130
OBJ# : 20445
DATAOBJ# : 20445
ULEVEL : 5
SUBLEVEL : 9
ILEVEL : 1582497813
FLAGS :
BESTSORTCOL : -1
TINSIZE : 16339840
CTINSIZE :
TOUTSIZE : 9972219
CMPSIZE :
UNCMPSIZE :
MTIME : 13-may-2014 23:14:46
SPARE1 : 31
SPARE2 : 5256
SPARE3 : 571822
SPARE4 :

Plus, there is also a BLOB column in compression$, which holds the analyzer’s findings:

select analyzer from compression$;</code>
ANALYZER
------------------------------------------------------------------------------
004B445A306AD5025A0000005A6B8E0200000300000000000001020000002A0000003A0000004A(output truncated for readability)

A quick check reveals that this is indeed our object:

exec pt('select object_name, object_type, owner from dba_objects where data_object_id = 20445');
OBJECT_NAME : TEST
OBJECT_TYPE : TABLE
OWNER : FOO
-----------------

PL/SQL procedure successfully completed.

And we can see the object is now stored in the IMC by looking at v$im_segments:

exec pt('select * from v$im_segments');
OWNER : FOO
SEGMENT_NAME : TEST
PARTITION_NAME :
SEGMENT_TYPE : TABLE
TABLESPACE_NAME : USERS
INMEMORY_SIZE : 102301696
BYTES : 184549376
BYTES_NOT_POPULATED : 0
POPULATE_STATUS : COMPLETED
INMEMORY_PRIORITY : HIGH
INMEMORY_DISTRIBUTE : AUTO
INMEMORY_DUPLICATE : NO DUPLICATE
INMEMORY_COMPRESSION : FOR QUERY LOW
CON_ID : 0
-----------------

PL/SQL procedure successfully completed.

Thus, we are getting the expected performance benefit of it being in the IMC:

alter session set inmemory_query=disable;</code>
Session altered.
Elapsed: 00:00:00.01
select count(*) from test;
COUNT(*)
----------
4187500
Elapsed: 00:00:03.96
alter session set inmemory_query=enable;
Session altered.
Elapsed: 00:00:00.01
select count(*) from test;
COUNT(*)
----------
4187500
Elapsed: 00:00:00.13

So far, so good.


Part II – Execution Plans

Some things we need to be aware of, though, when we are using the IMC in 12.1.0.2. One of them being that we can’t always trust in the execution plans anymore.

Let’s go back to our original sample table and recreate it using the default setting of INMEMORY PRIORITY NONE.

drop table test purge
/
create table test inmemory priority none
as
select a.object_name as name, rownum as rn,
sysdate + rownum / 10000 as dt
from all_objects a, (select rownum from dual connect by level &lt;= 500)
/

Now let’s see what plan we’d get if we were to query it right now:

explain plan for select name from test where name = 'ALL_USERS';
Explained.
@?/rdbms/admin/utlxpls
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1357081020
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 614 | 12280 | 811 (73)| 00:00:01 |
|* 1 | TABLE ACCESS INMEMORY FULL| TEST | 614 | 12280 | 811 (73)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - inmemory("NAME"='ALL_USERS')
filter("NAME"='ALL_USERS')
14 rows selected.

Okay, you might say now that EXPLAIN PLAN is only a guess. It’s not the real plan, and the real plan has to be different. And you would be right. Usually.

Watching the slave processes, there is no activity related to this table. Since it’s PRIORITY is NONE, it won’t be loaded into IMC until it’s actually queried for the first or second time around.

So let’s take a closer look than, shall we:

alter session set tracefile_identifier='REAL_PLAN';
Session altered.
alter session set events '10046 trace name context forever, level 12';
Session altered.
select name from test where name = 'ALL_USERS';
Now let’s take a look at the STAT line on that tracefile. Note: I closed the above session to make sure that we’ll get the full trace data.

PARSING IN CURSOR #140505885438688 len=46 dep=0 uid=64 oct=3 lid=64 tim=32852930021 hv=3233947880 ad='b4d04b00' sqlid='5sybd9b0c4878'
select name from test where name = 'ALL_USERS'
END OF STMT
PARSE #140505885438688:c=6000,e=10014,p=0,cr=2,cu=0,mis=1,r=0,dep=0,og=1,plh=1357081020,tim=32852930020
EXEC #140505885438688:c=0,e=58,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1357081020,tim=32852930241
WAIT #140505885438688: nam='SQL*Net message to client' ela= 25 driver id=1650815232 #bytes=1 p3=0 obj#=20466 tim=32852930899
WAIT #140505885438688: nam='direct path read' ela= 13646 file number=4 first dba=21507 block cnt=13 obj#=20466 tim=32852950242
WAIT #140505885438688: nam='direct path read' ela= 2246 file number=4 first dba=21537 block cnt=15 obj#=20466 tim=32852953528
WAIT #140505885438688: nam='direct path read' ela= 1301 file number=4 first dba=21569 block cnt=15 obj#=20466 tim=32852955406
…
FETCH #140505885438688:c=182000,e=3365871,p=17603,cr=17645,cu=0,mis=0,r=9,dep=0,og=1,plh=1357081020,tim=32857244740
STAT #140505885438688</strong> id=1 cnt=1000 pid=0 pos=1 obj=20466 op='TABLE ACCESS INMEMORY FULL TEST (cr=22075 pr=22005</strong> pw=0 time=865950 us cost=811 size=12280 card=614)'

So that’s still the wrong one right there, and the STAT line even clearly shows that we’ve actually done 22005 physical reads, and therefore likely no in-memory scan, but a full scan from disk. There’s clearly a bug there with the execution plan reported, which is plain wrong.

Thus, be careful about using INMEMORY PRIORITY NONE, as you may not get what you expect. Since the PRIORITY NONE settings may also be overridden by any other PRIORITY settings, your data may get flushed out of the IMC, even though your execution plans will say otherwise. And I’m sure many of you know it’s often not slow response times on queries which cause a phone ringing hot. It’s inconsistent response times. This feature, if used inappropriately will pretty much guarantee inconsistent response times.

Apparently, what we should be doing is size up the In Memory Column store appropriately, to hold the objects we actually need to be in there. And make sure they’re always in there by setting a PRIORITY of LOW or higher. Use CRITICAL and HIGH to ensure the most vital objects of the application are populated first.

There was one other oddity that I noticed while tracing the W00? processes.


Part III – What are you scanning, Oracle ?

The m000 process’ trace file reveals many back-to-back executions of this select:

PARSING IN CURSOR #140670951860040 len=104 dep=1 uid=0 oct=3 lid=0 tim=23665542991 hv=2910336760 ad='fbd06928' sqlid='24uqc4aqrhdrs'
select /*+ result_cache */ analyzer from compression$ where obj#=:1 and ulevel=:2

They all supply the same obj# bind value, which is our table’s object number. The ulevel values used vary between executions.


However, looking at the related WAIT lines for this cursor, we see:

WAIT #140670951860040: nam='direct path read' ela= 53427 file number=4 first dba=18432 block <strong>cnt=128</strong> obj#=20445 tim=23666569746
WAIT #140670951860040: nam='direct path read' ela= 38073 file number=4 first dba=18564 block <strong>cnt=124</strong> obj#=20445 tim=23666612210
WAIT #140670951860040: nam='direct path read' ela= 38961 file number=4 first dba=18816 block cnt=128 obj#=20445 tim=23666665534
WAIT #140670951860040: nam='direct path read' ela= 39708 file number=4 first dba=19072 block cnt=128 obj#=20445 tim=23666706469
WAIT #140670951860040: nam='direct path read' ela= 40242 file number=4 first dba=19328 block cnt=128 obj#=20445 tim=23666749431
WAIT #140670951860040: nam='direct path read' ela= 39147 file number=4 first dba=19588 block cnt=124 obj#=20445 tim=23666804243
WAIT #140670951860040: nam='direct path read' ela= 33654 file number=4 first dba=19840 block cnt=128 obj#=20445 tim=23666839836
WAIT #140670951860040: nam='direct path read' ela= 38908 file number=4 first dba=20096 block cnt=128 obj#=20445 tim=23666881932
WAIT #140670951860040: nam='direct path read' ela= 40605 file number=4 first dba=20352 block cnt=128 obj#=20445 tim=23666924029
WAIT #140670951860040: nam='direct path read' ela= 32089 file number=4 first dba=20612 block cnt=124 obj#=20445 tim=23666962858
WAIT #140670951860040: nam='direct path read' ela= 36223 file number=4 first dba=20864 block cnt=128 obj#=20445 tim=23667001900
WAIT #140670951860040: nam='direct path read' ela= 39733 file number=4 first dba=21120 block cnt=128 obj#=20445 tim=23667043146
WAIT #140670951860040: nam='direct path read' ela= 17607 file number=4 first dba=21376 block cnt=128 obj#=20445 tim=23667062232

… and several more.


Now, compression$ contains only a single row. Its total extent size is neglibile as well:

select sum(bytes)/1024/1024 from dba_extents where segment_name = 'COMPRESSION$';
SUM(BYTES)/1024/1024
--------------------
.0625

So how come Oracle is reading so many blocks ? Note that each of the above waits is a multi-block read, of 128 blocks.

Let’s take a look at what Oracle is actually reading there:

begin
pt('select segment_name, segment_type, owner
from dba_extents where file_id = 4
and 18432 between block_id and block_id + blocks - 1');
end;
/
SEGMENT_NAME : TEST
SEGMENT_TYPE : TABLE
OWNER : FOO
-----------------
PL/SQL procedure successfully completed.<code>

There’s our table again. Wait. What ?

There must be some magic going on underneath the covers here. In my understanding, a plain select against table A, is not scanning table B.

If I manually run the same select statement against compression$, I get totally normal trace output.

This reminds me of the good old:

select piece from IDL_SB4$;
ERROR:
ORA-00932: inconsistent datatypes: expected CHAR got B4

But I digress.

It could simply be a bug that results in these direct path reads being allocated to the wrong cursor. Or it could be intended, as it’s indeed this process’ job to analyze and load this table, and using this the resource usage caused by this is instrumented and can be tracked?

Either way, to sum things up we can say that:

– Performance benefits can potentially be huge
– Oracle automatically scans and caches segments marked as INMEMORY PRIORITY LOW|MEDIUM|HIGH|CRITICAL (they don’t need to be queried first!)
– Oracle scans segments marked as INMEMORY PRIORITY NONE (the default) only after they’re accessed the second time – and they may get overridden by higher priorities
– Oracle analyzes the table and stores the results in compression$
– Based on that analysis, Oracle may decide to load one or the other column only into IMC, or the entire table, depending on available space, and depending on the INMEMORY clause used
– It’s the W00? processes using some magic to do this analysis and read the segment into IMC.
– This analysis is also likely to be triggered again, whenever space management of the IMC triggers again, but I haven’t investigated that yet.

No Comments Yet

Let us know what you think

Subscribe by email