Alter session kill on steroids

6 min read
Aug 6, 2014

Perhaps you have encountered something like this: A session that is consuming too many resources needs to be killed. You locate the session and use ALTER SYSTEM KILL SESSION 'SID,SERIAL#' to kill the session. As you continue to monitor the database you find that the status of the session in v$session is 'KILLED', but the session does not go away. You also notice that the SERIAL# is continually changing. Now you find there is no OS process associated with the session, but the session continues as PMON is unable to finish cleanup for the session. Usually when this happens, the session will be holding a lock. When that happens, the only method to release the lock is to bounce the database. There are some bugs that may be responsible for this problem, such as this one described by Oracle Support: Pmon Spins While Cleaning Dead Process (Doc ID 1130713.1) This particular bug affects Oracle 10.2.0.1 - 11.1.0.7. I have personally seen this same behavior happen on many versions of the database from 7.0 on. To avoid these hanging sessions many DBA's have adopted the habit of first killing the OS process with an OS utility, and if the session is still visible in v$session, issue the ALTER SYSTEM KILL command. The OS command used on linux/unix is usually 'kill -9'. On windows it is OraKill. This method usually avoids the problems encountered when killing a session that is holding a lock and processing DML. I don't know just what circumstances trigger this behavior, as I have never been able to reproduce it at will. When it does happen though, it is more than annoying as the only way to clear locks held by the recalcitrant session is to bounce the database. Quite some time ago (at least as far back as Oracle 8i) Oracle introduced the new IMMEDIATE keyword to use with ALTER SYSTEM KILL SESSION. Using this keyword removes the need to use an OS command to kill a session - Oracle will do it for you! To test this I am using Oracle 10.2.0.4 on Oracle Linux 5.5. I have previously run these same tests in 11.2.0.3 with the same results. Had I access to an 8i or 9i database I would have run the tests there. To start with let's see what happens when a session is killed without the immediate keyword. Login to the session to be killed: $ sqlplus scott/tiger@10gr2 Login as SYSDBA from another terminal and check for scott's session: [sql]SQL> l 1 select 2 s.username, 3 s.sid, 4 s.serial#, 5 p.spid spid 6 from v$session s, v$process p 7 where s.username = 'SCOTT' 8* and p.addr = s.paddr SQL> / USERNAME SID SERIAL# SPID ------------------------------ ---------- ---------- ------------ SCOTT 133 35 22870 1 row selected. [/sql] All that has happened at this point is that Oracle has made an internal call that has disconnected Scott's session. (tracing that operation is a different topic.) The process on the server has not been terminated. This can be seen by the following experiment: Logon again as Scott. In a SYSDBA session check for Scott's: [sql] SQL> @scott USERNAME SID SERIAL# SPID ------------------------------ ---------- ---------- ------------ SCOTT 146 81 23678 [/sql] Now check for the shadow process associated with scott's session on the server: [root@ora10gR2 tmp]# ps -fp 23678 UID PID PPID C STIME TTY TIME CMD oracle 23678 1 0 16:56 ? 00:00:00 oraclejs01 (LOCAL=NO) Kill the session and check the status: [sql] SQL> alter system kill session '146,81'; SQL> l 1 select 2 s.username, 3 s.sid, 4 s.serial#, 5 p.spid spid 6 from v$session s, v$process p 7 where s.username = 'SCOTT' 8* and p.addr = s.paddr SQL>/ no rows selected [/sql] Check again on the server for the process: [root@ora10gR2 tmp]# ps -fp 23678 UID PID PPID C STIME TTY TIME CMD oracle 23678 1 0 16:56 ? 00:00:00 oraclejs01 (LOCAL=NO) Interesting, isn't it? We know the process is still alive on the server, but the session information is no longer associated with the process. This happens because Oracle has disconnected the session, which allows the process to continue until the sqlplus session is terminated. The session information is still available in v$session, but is no longer associated with a server process: [sql] select 2 s.username, 3 s.status, 4 s.sid, 5 s.serial# 6 from v$session s 7* where s.username = 'SCOTT' SQL>/ USERNAME STATUS SID SERIAL# ------------------------------ -------- ---------- ---------- SCOTT KILLED 146 81 1 row selected. 1* select pid,spid from v$process where pid = 146 SQL>/ no rows selected [/sql] When exiting the Scott session, I can see that the session was killed: [sql] SQL> exit ERROR: ORA-00028: your session has been killed [/sql] Let's perform the experiment again, but this time use the IMMEDIATE keyword. Logon as scott: > sqlplus scott/tiger@10gr2 SQL*Plus: Release 11.2.0.3.0 Production on Tue Aug 5 17:18:53 2014 ... Logon as SYSDBA and check for the scott session; [sql] SQL> @scott USERNAME SID SERIAL# SPID ------------------------------ ---------- ---------- ------------ SCOTT 146 83 23939 1 row selected. [/sql] Before killing scott's session:
  • get my OS PID
  • enable 10046 trace
The OS PID will be used for strace on the SYSDBA session shadow process on the server. The 10046 trace is so we can see what is happening in the strace output. [sql] SQL> l 1 select 2 s.username, 3 s.sid, 4 s.serial#, 5 p.spid spid 6 from v$session s, v$process p 7 where s.username is not null 8 and p.addr = s.paddr 9 and userenv('SESSIONID') = s.audsid 10* order by username, sid SQL>/ USERNAME SID SERIAL# SPID ------------------------------ ---------- ---------- ------------ SYS 145 65 23947 1 row selected. SQL> alter session set events '10046 trace name context forever, level 12'; Session altered. [/sql] Now ssh to the db server , check for Scott session shadow process and start strace: [root@ora10gR2 tmp]# strace -o 23947.strace -p 23947 ^Z [1]+ Stopped strace -o 23947.strace -p 23947 [root@ora10gR2 tmp]# bg [1]+ strace -o 23947.strace -p 23947 & [root@ora10gR2 tmp]# ps -p 23939 PID TTY TIME CMD 23939 ? 00:00:00 oracle Now kill Scott's session and exit the SYSDBA session: [sql] SQL> alter system kill session '146,83' immediate; System altered. [/sql] The strace command will now have exited on the server. First check again for Scott's session: [root@ora10gR2 tmp]# ps -p 23939 PID TTY TIME CMD [root@ora10gR2 tmp]# So the Scott shadow process has terminated. As the 10046 trace was enabled, the output to the oracle trace file will appear in the strace file, which allows searching for 'alter system kill' in the strace file. From the strace file: write(5, "alter system kill session '146,8"..., 44) = 44 Now searching for the PID of scott's session 23939: [sql] read(10, "23939 (oracle) S 1 23939 23939 0"..., 999) = 228 close(10) = 0 open("/proc/23939/stat", O_RDONLY) = 10 read(10, "23939 (oracle) S 1 23939 23939 0"..., 999) = 228 close(10) = 0 kill(23939, SIGKILL) = 0 kill(23939, SIGCONT) = 0 open("/proc/23939/stat", O_RDONLY) = 10 read(10, "23939 (oracle) Z 1 23939 23939 0"..., 999) = 178 close(10) = 0 [/sql] From the previous text I can see that Oracle opened the status file for PID 23939. Why it did so twice I am not sure. What happens after that is the interesting part. kill(23939, SIGKILL) = 0 That line means that the SIGKILL signal was successfully sent to Scott's shadow process. What does that mean? Run kill -l to get a list of signals:
kill -l
  1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL 5) SIGTRAP
  6) SIGABRT 7) SIGBUS 8) SIGFPE 9) SIGKILL 10) SIGUSR1
 11) SIGSEGV 12) SIGUSR2 13) SIGPIPE 14) SIGALRM 15) SIGTERM
 16) SIGSTKFLT 17) SIGCHLD 18) SIGCONT 19) SIGSTOP 20) SIGTSTP
 21) SIGTTIN 22) SIGTTOU 23) SIGURG 24) SIGXCPU 25) SIGXFSZ
 26) SIGVTALRM 27) SIGPROF 28) SIGWINCH 29) SIGIO 30) SIGPWR
 31) SIGSYS 34) SIGRTMIN 35) SIGRTMIN+1 36) SIGRTMIN+2 37) SIGRTMIN+3
 38) SIGRTMIN+4 39) SIGRTMIN+5 40) SIGRTMIN+6 41) SIGRTMIN+7 42) SIGRTMIN+8
 43) SIGRTMIN+9 44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12 47) SIGRTMIN+13
 48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12
 53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9 56) SIGRTMAX-8 57) SIGRTMAX-7
 58) SIGRTMAX-6 59) SIGRTMAX-5 60) SIGRTMAX-4 61) SIGRTMAX-3 62) SIGRTMAX-2
 63) SIGRTMAX-1 64) SIGRTMAX
 
Notice that signal 9 (kill -9) is SIGKILL. So when killing a session with ALTER SYSTEM KILL SESSION 'PID,SERIAL#' IMMEDIATE Oracle is actually doing the kill -9 for you, and has been for many years now. Though not shown here, this same test was run when the session was killed without using the IMMEDIATE keyword, and there were no attempts to kill the session. This was inferred as well by the fact the the process was still running on the server up until the time the Scott sqlplus session was exited.

Get Email Notifications

No Comments Yet

Let us know what you think