Pythian Blog: Technical Track

Mysterious rollback and replay with partitions

Recently I had a situation where I was loading a partitioned table and during the load, which was going to take hours, I realised that I wanted to add DEFAULT and NULL partitions to help with the robustness of future inserts. I was able to add the extra partitions but then noticed that my INSERT was rolling back. It had already processed for a few hours so I was dismayed that I would have to wait for the rollback and then re-issue the command, but to my astonishment the rollback completed and the statement automatically re-played and indeed went on to completion. Obviously this took much longer than anticipated but I was grateful that it worked. Following this behaviour I decided to investigate a little deeper and was able to reproduce the issue. Oracle Database version -> 12.2.0.1 The load table was hash partitioned with a range sub-partition and was set up as follows
 CREATE TABLE output_main_archive (
  EXTRACT_ID NUMBER
  , EXTRACT_DATE DATE
  , FIELD1 VARCHAR2(255)
 )
  PARTITION BY LIST (extract_id)
  SUBPARTITION BY RANGE (extract_date) 
  SUBPARTITION TEMPLATE 
  ( 
  SUBPARTITION ED_JAN_17 VALUES LESS THAN (TO_DATE('01-FEB-2017','DD-MON-YYYY'))
  , SUBPARTITION ED_FEB_17 VALUES LESS THAN (TO_DATE('01-MAR-2017','DD-MON-YYYY'))
  , SUBPARTITION ED_MAR_17 VALUES LESS THAN (TO_DATE('01-APR-2017','DD-MON-YYYY'))
  , SUBPARTITION ED_APR_17 VALUES LESS THAN (TO_DATE('01-MAY-2017','DD-MON-YYYY'))
  , SUBPARTITION ED_MAY_17 VALUES LESS THAN (TO_DATE('01-JUN-2017','DD-MON-YYYY'))
  , SUBPARTITION ED_JUN_17 VALUES LESS THAN (TO_DATE('01-JUL-2017','DD-MON-YYYY'))
  , SUBPARTITION ED_JUL_17 VALUES LESS THAN (TO_DATE('01-AUG-2017','DD-MON-YYYY'))
  , SUBPARTITION ED_AUG_17 VALUES LESS THAN (TO_DATE('01-SEP-2017','DD-MON-YYYY'))
  , SUBPARTITION ED_SEP_17 VALUES LESS THAN (TO_DATE('01-OCT-2017','DD-MON-YYYY'))
  , SUBPARTITION ED_OCT_17 VALUES LESS THAN (TO_DATE('01-NOV-2017','DD-MON-YYYY'))
  , SUBPARTITION ED_NOV_17 VALUES LESS THAN (TO_DATE('01-DEC-2017','DD-MON-YYYY'))
  , SUBPARTITION ED_DEC_17 VALUES LESS THAN (TO_DATE('01-JAN-2018','DD-MON-YYYY'))
  ) 
  (
  PARTITION ext_id_1 VALUES ( 1 ) 
  , PARTITION ext_id_2 VALUES ( 2 ) 
  , PARTITION ext_id_3 VALUES ( 3 ) 
  , PARTITION ext_id_4 VALUES ( 4 ) 
  , PARTITION ext_id_5 VALUES ( 5 ) 
  , PARTITION ext_id_6 VALUES ( 6 ) 
  , PARTITION ext_id_7 VALUES ( 7 ) 
  , PARTITION ext_id_8 VALUES ( 8 ) 
  , PARTITION ext_id_9 VALUES ( 9 ) 
  , PARTITION ext_id_10 VALUES ( 10 ) 
  ) 
 / 
 
I loaded a source table with some random test data like so
 CREATE TABLE output_main (
  EXTRACT_ID NUMBER
  , EXTRACT_DATE DATE
  , FIELD1 VARCHAR2(255)
 )
 /
 
 INSERT /*+ APPEND */ INTO output_main
 SELECT TRUNC(DBMS_RANDOM.value(1,11))
  , TO_DATE('01/01/2017','DD/MM/YYYY') + DBMS_RANDOM.value(0,365)
  , DBMS_RANDOM.string('L',TRUNC(DBMS_RANDOM.value(100,255))) AS long_string
 FROM dual
 CONNECT BY level <= 100000
 /
 
 INSERT INTO output_main SELECT * FROM output_main;
 INSERT INTO output_main SELECT * FROM output_main;
 
 COMMIT;
 
This gives me 400K rows to play with which is enough time for me to gather some evidence during the INSERT run. The first run was simply a copy of the rows from the source table to the partitioned table. I have also set on tracing so I can see what is happening.
 exec DBMS_MONITOR.SESSION_TRACE_ENABLE
 
 INSERT INTO output_main_archive SELECT * FROM output_main;
 
 COMMIT;
 
I examined the trace files using 3 methods
  1. tkprof - Oracle provided trace analysis tool
  2. tvdxtat.sh - A free tool written by Christian Antognini from Trivadis -> Introduce TVD$XTAT
  3. orasrp - A free tool written by Egor Starostin -> Oracle Session Resource Profiler
The last 2 tools produce html output and tkprof produces a text output. After examining the trace file and parsing it through tkprof I got the following output
 INSERT INTO output_main_archive SELECT * FROM output_main
 
 
 call count cpu elapsed disk query current rows
 ------- ------ -------- ---------- ---------- ---------- ---------- ----------
 Parse 1 0.00 0.00 0 0 0 0
 Execute 1 23.97 113.81 16760 337419 1193762 400000
 Fetch 0 0.00 0.00 0 0 0 0
 ------- ------ -------- ---------- ---------- ---------- ---------- ----------
 total 2 23.98 113.82 16760 337419 1193762 400000
 
We can see that the statement took just under 2 minutes on my little test system. Further more, after using the TriVaDis eXtended Tracefile Analysis Tool 4.0, I can see that 15 recursive statements were executed. Here they are
 update seg$ ...
 select ... from seg$ ...
 delete from deferred_stg$ ...
 delete from seg$ ...
 update partobj$ ...
 insert into seg$ ...
 update tabsubpart$ ...
 select ... from seg$ ...
 delete from tab_stats$ ...
 select ... from deferred_stg$ ...
 select ... from RecycleBin$ ...
 select ... from undo$ ...
 select ... from sys.obj$ o, sys.user$ u, sys.trigger$ t, sys.obj$ bo ...
 select ... from opt_directive_own$ ...
 select ... from hist_head$ ...
 
Now for the test run where I will ALTER the table during my INSERT. The ALTER statement is as follows
 ALTER TABLE output_main_archive ADD
 PARTITION ext_id_null VALUES (NULL)
  , PARTITION ext_id_unknown VALUES (DEFAULT)
 /
 
During the course of the run I will check the transactions using the following SQL
 set lines 140
 set pages 140
 
 col osuser for a15
 col username for a15
 col logon_time for a25
 col transaction_date for a25
 
 select s.sid
  , s.serial#
  , s.osuser
  , s.username
  , t.used_ublk
  , to_char(s.logon_time,'DD-MON-YYYY HH24:MI:SS') logon_time
  , to_char(t.start_date,'DD-MON-YYYY HH24:MI:SS') transaction_date
 from v$session s
  , v$transaction t
 where s.taddr = t.addr
 order by t.start_date;
 
Here's my sequence of events
  1. Session 1 : Perform the INSERT into the partitioned table
  2. Session 2 : Examine the transaction
  3. Session 2 : ALTER the partitioned table
  4. Session 2 : Examine the transaction
  5. Session 1 : Watch the transaction complete and commit
I started the INSERT command and the following is the output from my transaction script
 SYS@LUKEPDB1(LUKE) SQL> @trans
 
  SID SERIAL# OSUSER USERNAME USED_UBLK LOGON_TIME TRANSACTION_DATE
 ---------- ---------- --------------- --------------- ---------- ------------------------- -------------------------
  254 17710 oracle PART 309 28-MAR-2018 13:03:53 28-MAR-2018 13:05:18
 
 1 row selected.
 
Note the USED_UBLK number - this is the size of the UNDO space that is being used by the transaction. After having paused for a short while I ran it again
 SYS@LUKEPDB1(LUKE) SQL> @trans
 
  SID SERIAL# OSUSER USERNAME USED_UBLK LOGON_TIME TRANSACTION_DATE
 ---------- ---------- --------------- --------------- ---------- ------------------------- -------------------------
  254 17710 oracle PART 1542 28-MAR-2018 13:03:53 28-MAR-2018 13:05:18
 
 1 row selected.
 
We can see that the UNDO usage is rising as expected. I then performed the ALTER statement detailed above and monitored the UNDO usage. Here's what I saw
 SYS@LUKEPDB1(LUKE) SQL> @trans
 
  SID SERIAL# OSUSER USERNAME USED_UBLK LOGON_TIME TRANSACTION_DATE
 ---------- ---------- --------------- --------------- ---------- ------------------------- -------------------------
  254 17710 oracle PART 1244 28-MAR-2018 13:03:53 28-MAR-2018 13:05:18
 
 1 row selected.
 
 SYS@LUKEPDB1(LUKE) SQL> /
 
  SID SERIAL# OSUSER USERNAME USED_UBLK LOGON_TIME TRANSACTION_DATE
 ---------- ---------- --------------- --------------- ---------- ------------------------- -------------------------
  254 17710 oracle PART 99 28-MAR-2018 13:03:53 28-MAR-2018 13:05:18
 
 1 row selected.
 
 SYS@LUKEPDB1(LUKE) SQL> /
 
  SID SERIAL# OSUSER USERNAME USED_UBLK LOGON_TIME TRANSACTION_DATE
 ---------- ---------- --------------- --------------- ---------- ------------------------- -------------------------
  254 17710 oracle PART 37 28-MAR-2018 13:03:53 28-MAR-2018 13:05:18
 
 1 row selected.
 
 SYS@LUKEPDB1(LUKE) SQL> /
 
  SID SERIAL# OSUSER USERNAME USED_UBLK LOGON_TIME TRANSACTION_DATE
 ---------- ---------- --------------- --------------- ---------- ------------------------- -------------------------
  254 17710 oracle PART 99 28-MAR-2018 13:03:53 28-MAR-2018 13:06:34
 
 1 row selected.
 
 SYS@LUKEPDB1(LUKE) SQL> /
 
  SID SERIAL# OSUSER USERNAME USED_UBLK LOGON_TIME TRANSACTION_DATE
 ---------- ---------- --------------- --------------- ---------- ------------------------- -------------------------
  254 17710 oracle PART 230 28-MAR-2018 13:03:53 28-MAR-2018 13:06:34
 
 1 row selected.
 
We see the USED_UBLK reduce and then rise again showing us that the UNDO was being used during a rollback before the statement began again using UNDO. Once again I had traced this session and here are the results from tkprof
 INSERT INTO output_main_archive SELECT * FROM output_main
 
 
 call count cpu elapsed disk query current rows
 ------- ------ -------- ---------- ---------- ---------- ---------- ----------
 Parse 1 0.01 0.01 0 0 0 0
 Execute 1 76.44 168.62 23859 497949 2153572 400000
 Fetch 0 0.00 0.00 0 0 0 0
 ------- ------ -------- ---------- ---------- ---------- ---------- ----------
 total 2 76.46 168.63 23859 497949 2153572 400000
 
We can see that the elapsed time was significantly larger rising from 113.82s to 168.63s. Here are the main differences in the CPU and wait events
 
  1st Run 2nd Run 
 Component  Total Duration [s] Total Duration [s]
 
 CPU 23.983 76.301
 free buffer waits 44.404 48.304
 db file sequential read 5.934 21.437
 recursive statements 4.230 8.176
 
And the largest difference, using the output from orasrp, is that the 15 recursive statements increased to 922 statements (146 of these are unique) for the interrupted INSERT statement. One other thing, also shown using the orasrp tool, was that the UNDO data file is being used, as the read time and number of reads on the file ID (in my case it was file ID 10) had increased by a factor of 10.
 Session Read Datafiles Statistics
 
  1st Run 2nd Run
 Datafile # Seconds Calls Seconds Calls
 
 11 7.0077s 6,091 22.6365s 7,730
 9 0.0469s 122 0.1172s 123
 10 0.0007s 46 1.7878s 484
 8 0.0002s  9 0.1110s 49
 
So just some evidence that Oracle handles some situations for us and instead of either hanging the DDL or failing the INSERT, both statements complete successfully, admittedly after using a lot more resources than first anticipated.

No Comments Yet

Let us know what you think

Subscribe by email