Pythian Blog: Technical Track

Enqueue bytes - is that a pun?

Sometimes it is necessary to put on your uber-geek hat and start using cryptic bits of code to retrieve information from an Oracle database. Troubleshooting enqueue locking events in Oracle databases is one of the times some advanced SQL may be necessary. Likely you have used SQL similar to the following when troubleshooting Oracle enqueue's, probably in connection with row lock contention.
 SQL# l
  1 select
  2 s.username username,
  3 s.sid,
  4 e.event event,
  5 e.p1text,
  6 e.p1,
  7 e.state
  8 from v$session s, v$session_wait e
  9 where s.username is not null
  10 and s.sid = e.sid
  11 and e.event like '%enq:%'
  12* order by s.username, upper(e.event)
 
 USERNAME SID EVENT P1TEXT P1 STATE
 ---------- ---- ------------------------------ ---------- ------------- ----------
 JKSTILL 68 enq: TX - row lock contention name|mode 1415053318 WAITING
 
 1 row selected.
 
The value for P1 is not very useful as is; Oracle has encoded the type of enqueue and the requested mode into the column. When working with current events such as when selecting from v$session, it is simple to determine the type of lock and the mode requested by querying v$lock, such as in the following example:
  1* select sid, type, request, block from v$lock where sid=68 and request > 0
 SQL# /
 
  SID TY REQUEST BLOCK
 ---- -- ---------- ----------
  68 TX 6 0
 
 1 row selected.
 
Session 68 is waiting on a TX enqueue with requested lock mode of 6. Seasoned Oracle DBA's will recognize this as classic row lock contention. Why bother to find out just which type of enqueue this is? There are many types of locks in Oracle, and they occur for differing reasons. The TX lock is interesting as it can occur not only in Mode 6 but Mode 4 as well; Mode 4 refers to locks that involve unique keys, such as when 2 or more sessions try to insert the same value for a primary key. The following example shows just that:
 SQL# @s
 
 USERNAME SID EVENT P1TEXT P1 STATE
 ---------- ---- ------------------------------ ---------- ------------- ----------
 JKSTILL 68 enq: TX - row lock contention name|mode 1415053316 WAITING
 
 
 1 row selected.
 
 SQL# @l
 
  SID TY REQUEST BLOCK
 ---- -- ---------- ----------
  68 TX 4 0
 
 1 row selected.
 
 
Knowing just which lock mode is requested is vital, as the troubleshooting for TX Mode 4 locks will be different from what is used to troubleshoot Mode 6. Though we can find the lock name and mode information in v$lock, there is still value in being able to decipher that cryptic P1 column. The ASH and AWR facilities do not include any historical information for the lock name and mode; the P1 column found in v$active_session_history and dba_hist_active_sess_history does not have a corresponding dba_hist_lock view. Any research done after an event has occurred does require decoding this information.

Deciphering v$session.p1

Oracle Support document 34566.1 is the enqueue reference note that provides information needed to get the lock name and mode from the p1 column. As you will see this information is a bit puzzling. The rest of this article will focus on TX Mode 6 locks. The value shown for this lock and mode in the P1 column is always 1415053318. Following is the SQL recommended by Oracle:
  SELECT chr(to_char(bitand(p1,-16777216))/16777215)||
  chr(to_char(bitand(p1, 16711680))/65535) "Lock",
  to_char( bitand(p1, 65535) ) "Mode"
  FROM v$session_wait
  WHERE event = 'enqueue'
 
As I currently have some planned row lock contention in a test database we can run this query:
  1 SELECT chr(to_char(bitand(p1,-16777216))/16777215)||
  2 chr(to_char(bitand(p1, 16711680))/65535) "Lock",
  3 to_char( bitand(p1, 65535) ) "Mode"
  4 FROM v$session_wait
  5* WHERE event like '%enq:%'
 SQL# /
 
 Lo Mode
 -- ----------------------------------------
 TX 4
 
 
Probably it is not very clear why this SQL works. Let's try and understand it. (Note that a small change had to be made to the WHERE clause.) Converting the P1 value to hex may be helpful
 1415053318 = 0x54580006
 
The two lower order bytes represent the lock mode that has been requested. This can be seen here to be 0x06, which is simple translate to decimal 6 (I can do this one in my head) The next two bytes are also in hex and represent the two letters of the lock name. 0x54 = 84 = 'T' 0x58 = 88 = 'X' Using string functions it is simple to extract the values from the hex string, convert them to numbers and retrieve the lock name and mode.
 SQL# define p1 = 1415053318
 
 SQL# l
  1 with hex as (
  2 select trim(to_char(&p1,'XXXXXXXXXXXXXXXX')) hexnum from dual
  3 ),
  4 hexbreak as (
  5 select hexnum
  6 , to_number(substr(hexnum,1,2),'XXXXXXXX') enq_name_byte_1
  7 , to_number(substr(hexnum,3,2),'XXXXXXXX') enq_name_byte_2
  8 , to_number(substr(hexnum,5),'XXXXXXXX') enq_mode
  9 from hex
  10 )
  11 select
  12 hexnum
  13 , chr(enq_name_byte_1)
  14 || chr(enq_name_byte_2) enqueue_type
  15 , enq_mode
  16* from hexbreak
 SQL# /
 
 HEXNUM EN ENQ_MODE
 ----------------- -- ----------
 54580006 TX 6
 
 
While that does work, my inner geek wants to investigate those bitmasks and find out why they work. Next are the bitmasks in decimal along with the hex equivalent.
 -16777216 = 0xFFFFFFFFFF000000
  16777215 = 0xFFFFFF
  16711680 = 0xFF0000
  65535 = 0xFFFF
 
The bitand function is used to mask all unwanted bits to 0. The number is then divided by value needed to remove all of the now-zeroed-out lower order bytes. The values being used as bitmasks are -16777216 and 16711680. The use of -16777216 does not seem to make sense. As the intent is to mask all but one byte, I would expect to find an FF surrounded by a number of zeroes. The bit mask of 16711680, however, looks fine. Now let's run the Oracle support query again, but modified to show just the integer values rather than converting them to ASCII.
 
  1 SELECT bitand(p1,-16777216)/16777215,
  2 bitand(p1, 16711680)/65535,
  3 bitand(p1, 65535)
  4 FROM v$session_wait
  5* WHERE event like '%enq:%'
 SQL# /
 
 BITAND(P1,-16777216)/16777215 BITAND(P1,16711680)/65535 BITAND(P1,65535)
 ----------------------------- ------------------------- ----------------
  84.000005 88.0013428 6
 
Well, that is interesting. An implicit conversion is taking place with to_char() that is removing the decimal portion of these numbers. Is that being done with trunc(), round(), or something else? I don't know the answer to that. What seems more important is just doing the math correctly. There are a couple of things here that can be changed to make this work as expected.

A New BitMask

Let's modify the first bitmask to something that seems more reasonable than -16777216. Let's use this instead, as it masks only the single byte we need:
 4278190080 = 0xFF000000
 
Lets' try it out:
 SQL# l
  1 SELECT bitand(p1,4278190080)/16777215,
  2 bitand(p1, 16711680)/65535,
  3 bitand(p1, 65535)
  4 FROM v$session_wait
  5* WHERE event like '%enq:%'
 SQL# /
 
 BITAND(P1,4278190080)/16777215 BITAND(P1,16711680)/65535 BITAND(P1,65535)
 ------------------------------ ------------------------- ----------------
  84.000005 88.0013428 6
 
While the new bitmask didn't break anything, it does not appear to have helped either.

Off By One Error

The solution is to consider the divisors used to remove the lower order zeroes; each of them is off by one. That is easy enough to verify:
 SQL# l
  1 SELECT bitand(p1,4278190080)/16777216,
  2 bitand(p1, 16711680)/65536,
  3 bitand(p1, 65535)
  4 FROM v$session_wait
  5* WHERE event like '%enq:%'
 SQL# /
 
 BITAND(P1,4278190080)/16777216 BITAND(P1,16711680)/65536 BITAND(P1,65535)
 ------------------------------ ------------------------- ----------------
  84 88 6
 
Ah, that did it! But what was the problem previously?

Old Divisor Values

The original divisors are off by 1, which does not completely remove the lower order values.
  16777215 = 0xFFFFFF
  65535 = 0xFFFF
 
Increasing each by one has the desired effect.

New Divisor Values

  16777216 = 0x1000000
  65536 = 0x10000
 

Conclusion

Those odd bitmasks have been in the back of my mind for some time, and today it seemed like a good idea to dig in and find out more about them. It isn't too hard to imagine that in some cases the wrong values might be returned, leading to some long and unproductive troubleshooting sessions. There is a demo script enqueue-bitand.sql containing much of the SQL found in this article. There is also a script awr-top-sqlid-events.sql that incorporates the enqueue lock decoding. This script could be made better than it is, so please issue a pull request if you have some useful modifications. Also, as commenter Dominic Brooks has pointed out, this would be much simpler to do just by using mod(p1,16).

No Comments Yet

Let us know what you think

Subscribe by email