Pythian Blog: Technical Track

Oracle GoldenGate Extract Internals, Part II

Today we continue looking at various aspects of how the Oracle GoldenGate extract process works.

One of the follow up questions to part I was about the way the Extract process reads from ASM storage. I’ve provided the answer, however, today we’re going get a detailed look at how the Extract process interacts with an ASM instance and what kind of implications may result.

Let’s take a look at what changes in the Extract process loop when it reads from ASM storage:

nanosleep({1, 0}, NULL)                 = 0
...
write(18, "\37\333\0\0\6\0\0\0\0\0\3^\v)\4\4\0\1\0\0\0\376\377\377\377\377\377\377\377C\0\0"..., 8155) = 8155
write(18, "\37\333\0\0\6\0\0\0\0\0\1\1\0Z\7\n\0\0\0\0\0\1\2\0\0W\7\n\0\237\0\0"..., 8155) = 8155
write(18, "\37\333\0\0\6\0\0\0\0\0\1\0\4\0\0\0\237\0\0\1u\7\n\0\0\0\0\0\1\2\235\336"..., 8155) = 8155
write(18, "\22\367\0\0\6\0\0\0\0\0\0.\1\300\0\204\0\n\0\0\200\0\0\217\7\n\0\237\0\0\1\232"..., 4855) = 4855
read(18, "\37\333\0\0\6\0\0\0\0\0\v\1\5\276\4\0\0\0\0\0\1\0\0\0\0\0\0\0\260[\211\312"..., 8208) = 8208
read(18, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8155) = 8155
read(18, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8155) = 8155
read(18, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8155) = 4540
...
nanosleep({1, 0}, NULL)                 = 0

As before, to make the loop a bit more understandable, I’ve removed a bunch of syscalls. Let’s take a look at the file descriptor 18:

[root@gg1 fd]# ls -l 18
lrwx------ 1 oracle oinstall 64 Jan 24 20:56 18 -> socket:[24439]

Let’s find out more about this socket…

[root@gg1 fd]# lsof -p 7725 | grep 18u
extract 7725 oracle   18u  IPv6              24439                TCP [::1]:16428->[::1]:ncube-lm (ESTABLISHED)
[root@gg1 fd]# lsof -i :16428
COMMAND  PID   USER   FD   TYPE DEVICE SIZE NODE NAME
extract 7725 oracle   18u  IPv6  24439       TCP [::1]:16428->[::1]:ncube-lm (ESTABLISHED)
oracle  7753 oracle   16u  IPv6  24440       TCP [::1]:ncube-lm->[::1]:16428 (ESTABLISHED)
[root@gg1 fd]# ps -f -p 7753
UID        PID  PPID  C STIME TTY          TIME CMD
oracle    7753     1  0 20:56 ?        00:00:00 oracle+ASM (LOCAL=NO)

In place of the read syscalls there is now some sort of communication going on with ASM instance.

Alas, when it comes to tracing, an ASM instance is not the most friendly thing you can work with. But there are other ways. Enabling SQL*Net trace is enough to see what replaced the read syscall:

BEGIN dbms_diskgroup.read(:handle, :offset, :length, :buffer); END;

This is nothing other than one of the ASM internal packages which can be used to read directly from ASM disk group. How is the :handle being obtained? If we go a little bit up the trace, we can find the answer:

BEGIN dbms_diskgroup.getfileattr('+REDO/gg1/onlinelog/group_2.2547.708989191', :filetype, :filesize, :lblksize); END

The above code is used to get file attributes which are necessary to call the dbms_diskgroup.open procedure:

BEGIN dbms_diskgroup.open('+REDO/gg1/onlinelog/group_2.257.708989191', 'r', :filetype, :lblksize, :handle, :pblksize, :filesize); END;

This call will return a :handle which can later be used in dbms_diskgroup.read. The name of the online redo log file to read is being fetched by querying the RDBMS instance:

SELECT member,        DECODE(status, 'CURRENT', 1, 0),        DECODE(archived, 'YES', 1, 0)   FROM (select lf.member,        l.status,        1.archived        from v$logfile lf, v$log l        WHERE lf.group# = l.group# AND        l.thread# = :ora_thread AND        l.sequence# = :ora_seq_no AND        (lf.status NOT IN        ('INVALID','INCOMPLETE','STALE') OR        lf.status is null)        order by lf.member DESC )        where rownum = 1

Note that this query is a bit different form the one used when fetching the log file name on a file system. Both dbms_diskgroup.getfileattr and dbms_diskgroup.open calls happen when a new log file needs to be accessed (during startup or after a log file switch, for example), i.e. we do not execute these as part of a main loop listed above.

The next thing to find out is what happens in the ASM dedicated process as a result of dbms_diskgroup.read call:

read(16, "\37\333\0\0\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8208) = 8155
read(16, "\37\333\0\0\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8208) = 8208
read(16, "\1\0\0\0\0\0\0\0\0\0\0\v\1\33\5\0\0\31\2\2\r\0\0\0\0\0\0\372\0\300\0M"..., 8155) = 4803
pread(15, "\1\"\0\0\22 \0\0\21\0\0\0H\200\305\365\5\24\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 28672, 136324096) = 28672
write(16, "\37\333\0\0\6\0\0\0\0\0\v\1\5\326\4\0\0\0\0\0\1\0\0\0\0\0\0\0\240/V\232"..., 8155) = 8155
write(16, "\37\333\0\0\6\0\0\0\0\0\v\5\1\0\4\0\0\0\237\0\0\1g\251\v\0\0\0b\251\1\2"..., 8155) = 8155
write(16, "\37\333\0\0\6\0\0\0\0\0\0\2\0\3\0\4\0\5\0\6\0\7\0\10\0\t\0\n\0\v\0\f"..., 8155) = 8155

If you take a loot at the lsof output above, the file descriptor 16 is a socket back to the Extract process so these syscalls are related to SQL*Net messages. What we’re really interested in is a call on line# 4 with a pread syscall from the file descriptor 15:

[root@gg1 fd]# ls -l 15
lrwx------ 1 oracle oinstall 64 Jan 24 20:58 15 -> /dev/sdb1

Of course, /dev/sdb1 is a device under out REDO diskgroup:

SQL> select d.path, g.name
        from v$asm_disk d, v$asm_diskgroup g
        where d.group_number = g.group_number
                and d.path='/dev/sdb1';  2    3    4
PATH            NAME
--------------- ----------------
/dev/sdb1       REDO

What else can we say? First of all, the read size is significantly smaller, just 28672 bytes, compared to 1000K read size when the log has been located on a cooked file system. If you get such a small read size and the fact that the data needs to go through the network and SQL*Net stacks… I would say that I expect this to be much less efficient compared to how online redo logs are being read from a file system. One of the immediate things to realize is that, in case you’re running the Extract process on the same machine as your ASM instance, it probably makes total sense to configure the connection string to ASM instance (the one which is being specified in the Extract process parameters) using bequeath protocol so the traffic can go through a pipe instead of a socket which should provide better performance.

No Comments Yet

Let us know what you think

Subscribe by email