Pythian Blog: Technical Track

Oracle GoldenGate Extract Internals, Part I

Since GoldenGate has been declared as a strategic direction for replication technology by Oracle, it sounds like it’s time to get up to speed with various aspects of how this technology works and performs.

As many of you are probably aware, up until recently, GoldenGate had been a third-party product. Technology-wise this presents an interesting challenge for the GoldenGate development team as they have to rely on whatever Oracle makes available to the outside world. Let’s see what kind of techniques they were able to utilize in order to achieve their goals.

I did a simple replication setup between two different databases with the Extract, DataPump and Replicat processes. I’m planning to take a look at all of these but today is the Extract’s process turn.

Oracle GoldenGate Extract Process
The main duty of the Extract process is to read and process Oracle redo logs in order to extract relevant changes and write these to a trail.

Reading from the Redo Logs
This is probably the most interesting aspect of the Extract process as this is where various technology stacks are being bridged together. Unless you configure Extract process to read strictly from the archived logs, it will try to read from online redo logs whenever possible. Let’s take a look at an strace of a running Extract process:

lseek(18, 2659328, SEEK_SET)            = 2659328
nanosleep({1, 0}, NULL)                 = 0
...
read(18, "\1\"\0\0J\24\0\0U\0\0\0H\200\301\302\0\0\0\0\0\0\0\0\0\6\0\0\n\0\20\0"..., 1024000) = 1024000
...
lseek(18, 2659328, SEEK_SET)            = 2659328
nanosleep({1, 0}, NULL)                 = 0
...
read(18, "\1\"\0\0J\24\0\0U\0\0\0H\200\301\302\0\0\0\0\0\0\0\0\0\6\0\0\n\0\20\0"..., 1024000) = 1024000
...
lseek(18, 2659328, SEEK_SET)            = 2659328
nanosleep({1, 0}, NULL)

I’ve omitted a bunch of syscalls as I’ll get to these a bit later plus it makes it much easier to see and understand the loop. Judging by the nanosleep syscall you can figure out that this cycle repeats every second. I’ve left file descriptor 18 in place since it’s what pointing out to our current redo log file:

[oracle@gg1 fd]$ ls -l 18
lr-x------ 1 oracle oinstall 64 Jan 19 16:18 18 -> /u02/oradata/GG1/onlinelog/o1_mf_1_5oco39l6_.log

Let’s take a closer look at what happens inside that loop. The loop begins with a lseek syscall which sets offset for the specified file descriptor. After a one second delay the Extract process performs a 1000K read starting at the offset sat by the lseek. You can see multiple lseek calls to set the offset to exactly the same position, one for each time the loop gets executed. This is necessary since the read syscall advances the current position in a file, hence we need an lseek syscall to get us back to where we’ve started.

As you’ve probably already guessed, this is nothing but a constant polling of the redo log file contents to see whether the new data has arrived. You can confirm that 2659328 is the current position in the online redo log by using a query against x$kcccp:

SQL> select cpodr_bno, cpodr_bno*512, (cpodr_bno-1)*512
  2  	from x$kcccp where indx=0;
 CPODR_BNO CPODR_BNO*512 (CPODR_BNO-1)*512
---------- ------------- -----------------
      5195       2659840           2659328

The Extract process starts reading from (and including) the last written block. When the new data arrives, the Extract process advances the offset appropriately:

lseek(18, 2729984, SEEK_SET)            = 2729984
nanosleep({1, 0}, NULL)                 = 0
...
read(18, "\1\"\0\0\324\24\0\0U\0\0\0\20\200b\227\204\0\0\0\5h\0\0\200\321\10\0\1\0w5"..., 1024000) = 1024000
...
lseek(18, 2731008, SEEK_SET)            = 2731008
nanosleep({1, 0}, NULL)                 = 0

The above polling continues until the Extract process detects that there is a supported operation in the redo stream which potentially requires capturing. Remember that redo records contain object ids, not object names. When the object identifier is encountered for the first time, the Extract process will issue a series of statements against the data dictionary to find out what it is. These queries are relatively lightweight, however, if you have a large number of objects inside your database then your data dictionary may became bombarded with these queries upon the Extract process start. The corresponding object name and type will be determined. In case of an index it will be checked whether it’s an IOT in order to resolve the table name properly.

Challenges
There are a number of interesting scenarios which may occur during online redo log reading. To begin with, you’re dealing with a file to which lgwr is actively writing. Processes inside an Oracle database use a vast amount of coordination to make sure they do not step on each other’s toes, however, when you’re a third-party process operating outside of the database then what’s left available to you is to constantly observe the situation and react accordingly while making sure you won’t get in anybody’s way. For example, it would be nice to ask lgwr to notify us when the new data had been written to the online redo log file, but we can’t. So we have to resort to a constant polling to see if something new has appeared. This is one of the areas to keep an eye on as GoldenGate integration with Oracle progresses over time.

Oracle redo logs are being written sequentially in the circular fashion which makes the task of reading these a bit less challenging, however, you may still find yourself in various nasty situations when you think about different possibilities. For example, while you’re reading the next 1000K, the lgwr might be writing to the same area so some (or all) of the data which you read back might be garbage. My guess would be that the Extract process just keeps reading forward until it encounters data that is no longer valid (due to invalid redo record or RBA not from the expected sequence) and then resumes from the last valid record encountered.

Here is something more interesting to think about. What if, while you’re reading the redo log file, lgwr manages to go the complete round (i.e. switch through all redo groups and get to the one you’re currently reading from) and overwrite the data you’re currently working with? This race condition is relatively unlikely to happen in a real life but still a possibility so I became curious to test this out.

I’ve blocked the Extract process, switched the redo logs around and made sure that the current online redo log gets overwritten with the new data past the current extract point. Here is an excerpt of the relevant data form the strace output for the Extract process:

lseek(18, 3027456, SEEK_SET)            = 3027456
nanosleep({1, 0}, NULL)                 = 0
fsync(19)                               = 0
lseek(15, 0, SEEK_SET)                  = 0
write(15, "H\0\0\10R\0\n\0\2\0\4\0A\0\1\0\1\0C\0\4\0\3\0\1\0S\0\30\0Sf"..., 2048) = 2048
fsync(15)                               = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
select(8, [7], NULL, NULL, {0, 0})      = 0 (Timeout)
read(18, "\1\"\0\0\31\27\0\0{\0\0\0\340\201\205N****************"..., 1024000) = 1024000
write(16, "\0$\0\0\6\0\0\0\0\0\3N3\7\0\0\0\2\0\0\0`\0\0\0\0\0\0\0\7\0011"..., 36) = 36
read(17, "\0\254\0\0\6\0\0\0\0\0\4\1\0\0\0002\24\1\0\0\0\0{\5\0\0\0\0\7\0\0\0"..., 8208) = 172
write(4, "2010-01-20 12:49:02.084  Redo th"..., 146) = 146
write(16, "\0.\0\0\6\0\0\0\0\0\3N4\16\0\0\0\1\0\0\0`\0\0\0\0\0\0\0\7\0031"..., 46) = 46
read(17, "\1\2\0\0\6\0\0\0\0\0\6\1\"\0\1\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 8208) = 258
stat("/u02/fra/GG1/archivelog/2010_01_20/o1_mf_1_120_5ogjbvbj_.arc", {st_mode=S_IFREG|0640, st_size=3034112, ...}) = 0
open("/u02/fra/GG1/archivelog/2010_01_20/o1_mf_1_120_5ogjbvbj_.arc", O_RDONLY|O_DIRECT) = 20

Let’s go through what happened and how this situation has been handled. The loop starts with positioning the file descriptor at where we left off which is 3027456. After a one second delay, a 1000K redo log read follows. Upon startup, the Extract process establishes two connections with a database using whatever userid credentials you specified in the parameter file. File descriptors 16 and 17 are pointing out to one of these sessions (a dedicated Oracle processes in my case) which the Extract can use to query various information about the database. The query on line 12 checks whether the sequence and thread from which we’re currently reading got archived:

SELECT DECODE(archived, 'YES', 1, 0),       status  FROM v$log WHERE thread# = :ora_thread AND       sequence# = :ora_seq_no

In our case, that statement will return zero rows since this particular sequence (we’ve been reading seq# 120 when the wrap has occurred) is no longer there. My guess would be that that is one of the clues the Extract process might be using to deal with the situation. Another clue is by looking at what was read from the redo log file itself. Remember that the Extract process read includes the last written redo block which can potentially be compared with the same block from a previous read in order to detect if the wrapping has occurred. After realizing that the log has wrapped, the Extract process will write a warning into it’s report file (line 14):

2010-01-20 12:49:02.084  Redo thread 1: Online log /u02/oradata/GG1/onlinelog/o1_mf_2_5og4zot6_.log on sequence# 120 has missing trailing blocks.

The next step is to do a fall back and figure out where to pick up. This task is being accomplished by another query sent on line 15:

SELECT  name    FROM gv$archived_log   WHERE sequence# = :ora_seq_no AND         thread# = :ora_thread AND         resetlogs_id = :ora_resetlog_id AND         archived = 'YES' AND         deleted = 'NO' AND         name not like '+%'         AND standby_dest = 'NO'

The bind values are:

BINDS #14:
 Bind#0
  oacdty=01 mxl=32(03) mxlc=00 mal=00 scl=00 pre=00
  oacflg=21 fl2=1000000 frm=01 csi=178 siz=96 off=0
  kxsbbbfp=2ba8c340f618  bln=32  avl=03  flg=05
  value="120"
 Bind#1
  oacdty=01 mxl=32(01) mxlc=00 mal=00 scl=00 pre=00
  oacflg=21 fl2=1000000 frm=01 csi=178 siz=0 off=32
  kxsbbbfp=2ba8c340f638  bln=32  avl=01  flg=01
  value="1"
 Bind#2
  oacdty=01 mxl=32(09) mxlc=00 mal=00 scl=00 pre=00
  oacflg=21 fl2=1000000 frm=01 csi=178 siz=0 off=64
  kxsbbbfp=2ba8c340f658  bln=32  avl=09  flg=01
  value="708599728"

In other words, we’re looking into gv$archived_log to find out where the required sequence went to (note how ASM files will be bypassed by the means of name not like ‘+%’ condition). Once the name of the archived log file has been returned, the Extract process will proceed with opening and reading from it. After that, the Extract process will check whether the next sequence is available to read from the online log file:

SELECT lf.member,        DECODE(l.status, 'CURRENT', 1, 0),        DECODE(l.archived, 'YES', 1, 0)   FROM v$logfile lf, v$log l  WHERE lf.group# = l.group# AND        l.thread# = :ora_thread AND        l.sequence# = :ora_seq_no AND        lf.member not like '+%' AND        rownum = 1 AND        (lf.status NOT IN        ('INVALID','INCOMPLETE') OR        lf.status is null)

If the sequence is available, the online redo log file will be opened, otherwise the query against v$archived_log will be issued to find out the required archived log file to open. As I’ve already mentioned, the Extract process prefers to read from the online log file whether possible.

It was indeed a pleasure to see that that situation has been handled correctly. The warning emitted into a report file was nice to see as well (now you know one of the situations when this warning may appear). The GoldenGate development team has earned some credit, at least on this particular situation.

Stay tuned for the next part.

No Comments Yet

Let us know what you think

Subscribe by email