Pythian Blog: Technical Track

Cursor: Pin S Wait on X in the Top 5 Wait Events

Lately, for some reason, I have started to come across the “cursor: pin S wait on X” wait event in the top 5 wait events in AWR reports. It was as if someone wanted to tell me that I was wrong in my last post ( Locks, Latches, Mutexes and CPU usage), stating that a significant mutex contention should be rarely experienced. It started showing itself with DB time percentages, waiting on this with some systems going to almost 20%! So what exactly is the root cause? Why is so much time spent waiting on a busy mutex when it should protect just one cursor? As a troubleshooting example, I will explore a system spending 14.22% (a bit of an extreme case) of its DB time waiting on cursor: pin S wait on X. It is the Oracle RDBMS 11.2.0.2 system. Let’s start by dissecting the wait event itself. We don’t see mutex in the wait event name, but we know that mutexes are used to protect the cursor in this release. It tells us that a session is waiting to put the cursor mutex in a (S)hared mode, but the mutex is already held in e(X)clusive mode. So when is a cursor mutex needed in a shared state and when in an exclusive one? When a new cursor is given to Oracle to execute, it first needs to check if such a cursor already exists in the Shared Pool, so we will not need to go to the expensive process of hard parsing it again. To check this quickly, a hash value is generated from the text of the SQL. For example, the statement:

select 'bla' from dual 

has a hash value of 957527304:

SQL> select sql_text from v$sqlarea where hash_value=957527304

SQL_TEXT

------------------------------

select 'bla' from dual

 

If the same statement is executed again, it will have the same hash value, and its cursor will be quickly found in the Shared Pool based on that hash value. But there is more...

Cursor Versions (Children)

Each cursor can have more children (or versions) for the same SQL statement and hash value. They are linked trough a list to the parent. The cursor hash value itself is the cursor parent, and there is at least one child for each cursor (child number 0), which would contain the details for the first SQL execution. Each cursor child contains different metadata generated for subsequent executions if the same SQL was executed in different circumstances.

A simple reason for having a different version would be:

UserA has a table T in his schema and UserB doesn’t, but say there is a public synonym “T”, which points to a UserC.T. So UserA and USerB would execute the same statement (for example select * from T), which would hash to the same value and point to the same cursor. We obviously cannot just reuse the already parsed statement for UserA, as UserB might not even have permission to access the UserC.T table. We will access a completely different object than UserA did and generate a different exaction plan as well. So after the statement hashes to the same value, it needs to go through the cursor children for that statement to find if this particular SQL was already executed in the same scenario as this one. To see if we find a child like that being executed in the same context (same cursor metadata) as UserB wants to execute it:

If a match is found among the cursor children, that cursor will be used to execute the statement, and this would be a so called soft parsing. Otherwise, it will need to be parsed from scratch and will result in a hard parse, while a new cursor child will be born and added to the cursor linked list. There are quite a few reasons why a cursor cannot be shared and why a different child will be created. An example is when a cursor child is created with optimal execution plan for different bind variable value used in the SQL execution, preventing a so called “bind peaking” due to Adaptive Cursor Sharing in 11.2. Cursor children and the reason for their existence can be examined trough the V$SQL_SHARED_CURSOR, which for a specific SQL_ID will show ‘Y’ in the column for the specific reason as well as having a partially descriptive column called REASON. More details on each and every reason for a child cursor's existence can be found on MOS.

 

Soft and Hard Parsing Impact

So what impact do cursor versions have on cursor mutexes and contention for them? Well, while scanning the cursor children linked-list, we need to pin these children with a shared cursor mutex, so they don’t get purged from the shared pool during scanning. It obviously will be longer if the linked list we need to go through is long. If we have a large number of a particular SQL execution which leads to JUST soft parsing, meaning we successfully reuse cursors, we can experience many brief waits on the cursor: pin S wait event. This is due to multiple sessions trying to increase/decrease the “interested sessions” reference count found in the shared mutex value. This operation requires for this update to be atomic in nature, not allowing multiple sessions to change this mutex value at the same time, so sessions must wait until the other sessions completely finishes with the operation of getting the current mutex value and incrementing or decrementing it. Chances of this will be increased if we have a higher number of children per cursor, each with its own mutex, which will need to be locked in a shared mode while sessions scan trough them. When we can’t find a suitable match between the cursor children, we need to create a new one. As I said, creating a new one involves hard parsing (syntax and semantic check of the SQL, checking user privileges on objects in the SQL, creating a new exaction plan...). While we do this, we must prevent another session from creating the same type of cursor child. We also need to prevent other sessions to get this mutex in a shared mode, since it is still not ready to be reused. So, we must hold an exclusive lock on the cursor child mutex. While holding the mutex in X mode, if other sessions are trying to execute the same SQL with the same context (cursor metadata), they will find that a child like that is there but still being held in X mode. Therefore, they will wait to get it in a shared (S) mode to reuse it while we see the session waiting on a cursor: pin S wait on X.

In this situation, again, a high number of cursor children being generated would require a higher number of hard parsing, putting the particular mutex for that child in X mode, while other sessions will come ask to get that mutex in S mode in order to scan it – this increases the chances of seeing the event cursor: pin S wait on X.

In both of these cases, the wait time of either cursor: pin S and cursor: pin S wait on X (per event) should be quite small, or under 10 ms, as recommended by Oracle.

To check if this is the case, the SQL hash value can be found from the value of the MUTEX_IDENTIFIER with highest gets and sleeps from v$mutex_sleep_history:

 select MUTEX_IDENTIFIER, MUTEX_TYPE, max(gets), max(sleeps), mutex_value  from v$mutex_sleep_history

where mutex_type='Cursor Pin'

group by MUTEX_IDENTIFIER, MUTEX_TYPE, mutex_value

order by 3, 4

SQL> /

MUTEX_IDENTIFIER MUTEX_TYPE MAX(GETS) MAX(SLEEPS) MUTEX_VA

---------------- -------------------------------- ---------- ----------- --------

487379649 Cursor Pin 9 1 00280000
4063208512 Cursor Pin 1 1 002A0000
3607215236 Cursor Pin 1 1 002A0000
3873422482 Cursor Pin 9 3 00260000
957527304 Cursor Pin 522847 4 00220001

SQL> select sql_id, sql_text, version_count from v$sqlarea where hash_value=957527304

SQL> /

SQL_ID SQL_TEXT VERSION_COUNT

------------- -------------------------------------------------- -------------

6b83d7cwj5d88 select 'bla' from dual 1

Some arbitrary low number of cursor versions, as recommended by Oracle, would be less than 20.

Higher hard parsing and not many cursor versions

There are two possibilities here:

  1. Having a large number of hard parsing will of course mean that there are greater chances that a particular mutex for a specific cursor child will be held in X mode, while other interested sessions will wait to get in S mode. But wait – if there are so many interested sessions for the same cursor child, shouldn’t it already be in the shared pool and just be reused (through soft parsing)? Well, not if it is getting flushed from the shared pool constantly due to a lack of space or if the shared pool is being manually flushed for some reason.
  2. There is a small number of hard parses per SQL, but hard parsing takes longer than usual. This would be indicated as a smaller number of the cursor: pin S wait on X event, but has longer wait times. Another indication for this would be having a small number of GETS compared to SLEEPS for a particular MUTEX_IDENTIFIER in v$mutex_sleep_history. This means that the specific mutex were needed in S mode but were not able to reach that due to it already being in the X state – this is the number of GETS. As described in my previous post about mutexes, the mutex will go to sleep for some time (_mutex_wait_time init parameter) and then wake up. If the mutex is still in X mode, it will go to sleep again. The number of times it does this is reflected in the columns SLEEPS – meaning it was held for long time in X mode – which again means it took a long time to be hard parsed.
An example of this:
  Wait Event Wait Time Summary Avg Wait Time (ms)
I# Class Event Waits %Timeouts Total(s) Avg(ms) %DB time Avg Min Max Std Dev Cnt
 
3   DB CPU     53,797.11   28.78          
  Concurrency cursor: pin S wait on X 39,661 0.00 26,577.32 670.11 14.22          
  Concurrency library cache lock 19,439 0.59 12,909.70 664.11 6.91          
  User I/O cell smart table scan 1,284,212 26.06 10,315.90 8.03 5.52          
  Other PX Deq: reap credit 10,709,617 100.05 6,486.36 0.61 3.47          
  Other PX qref latch 1,351,538,727 101.76 5,731.03 0.00 3.07          
  Concurrency library cache: mutex X 33,161 0.00 3,876.05 116.89 2.07          
  Concurrency latch: cache buffers chains 84,209 0.00 3,614.56 42.92 1.93          
  User I/O cell single block physical read 2,851,163 0.00 3,144.54 1.10 1.68          
  Cluster gc buffer busy acquire 776,453 0.00 1,546.04 1.99 0.83          
                                                   

In this example, we see quite an enormous wait time for the cursor: pin S wait on X of 670 ms – and this is average – meaning that on average, it was taking 670 ms for statements to be hard parsed. Long parsing, as I have experienced, tend to be due to different bugs that can be found on MOS related to the cursor: pin S wait oc X event and found related to a specific LOCATION from v$mutex_sleep_history, indicating the Oracle function where the waiting was done. Bugs are usually associated with a specific type of SQL being executed, so it is good to find the SQLs that were being hard parsed. As is often the case, this has happened maybe few hours ago and we need to find the cause now, so we will use the Workload Repository tables:

select sql_id, count(*) from dba_hist_active_sess_history 

where snap_id=&snap_id <= Snap_id got from the DBA_HIST_SNAPSHOT, for the specific time.

and event='cursor: pin S wait on X'

group by sql_id

order by 2

SQL> / SQL_ID COUNT(*) ------------- ---------- 36rqr5grvqn6k 1 1n3gdnr84hnkh 1 gk2yqccsn75hs 3 9x24gfdtgkg27 4 7fzk0rhz8tg9n 5 7205ntynavtdk 8 9hd8sk6xsuuxa 10 1dsdc6gdfavh0 14 grt072zxh2a33 31

 

As we see, the SQL_ID grt072zxh2a33 is the main suspect for mutex contention as it was most often being executed when we were waiting to get the specific mutex in a shared mode while it was held in exclusive mode. This SQL was being hard parsed at the time. So the next step is to match a specific symptom from the list of possible bugs that causes long waits on the cursor:pin S wait on X wait event and long hard parsing, with the specific type of the SQL we had found and the specific Oracle version we are running. As seen from the My Oracle Support, it seems that when running Oracle RDBMS 11.2.0.2, it is best to either implement patch update 11.2.0.2.2 or better to use 11.2.0.3, where most of these mutex related bugs have been fixed.

No Comments Yet

Let us know what you think

Subscribe by email