Pythian Blog: Technical Track

Oracle Trace Files in User Directories

I encountered an unusual situation recently where Oracle was writing trace files into the user directory instead of the usual oracle diagnostic destinations. Our monitoring software indicated that the "/home" directory was filling up. When I checked the space, I saw it had become 85% full. $ df /home Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/home 20642428 16503552 3090300 85% /home Digging further, I found the Pythian user was using a large amount of space. This was very unusual since the monitoring software we use has a very small footprint and the log files rotate so as not to use a lot of disk space. $ cd home $ du -sh * 392K bturner 16K lost+found .... 4.5G pythian ... Going even further, I found that an unfamiliar directory "oradig_pythian" had been created. I had never seen this directory before, and it was not related to our monitoring software. $ cd pythian $ du -sh * 2.4M avail 11M logs 4.3G oradiag_pythian ... Drilling down the "oradiag_pythian", I found an usually named directory "host_1460457767_80", which was full of subdirectories that were similar in name to the Oracle diagnostic directories. $ pwd /home/pythian/oradiag_pythian/diag/clients/user_pythian/host_1460457767_80 $ du -sh * 2.9G alert 4.0K cdump 4.0K incident 4.0K incpkg 4.0K lck 276K metadata 4.0K metadata_dgif 4.0K metadata_pv 4.0K stage 4.0K sweep 1.4G trace The only file in the "trace" directory was a sqlnet.log file. This 1.4GB file was being written to many times per minute, complaining of a missing Oracle diagnostic directory. $ ls -ltr trace | tail -3 total 1466924 -rw-rw---- 1 pythian pythian 1500659093 Mar 13 09:36 sqlnet.log $ cd trace [pythian@bsn-s-ora11g-1 trace]$ tail -10 sqlnet.log Wed Mar 13 09:42:04 2013 Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients] Wed Mar 13 09:42:04 2013 Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients] Wed Mar 13 09:42:05 2013 Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients] Wed Mar 13 09:42:05 2013 Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients] Wed Mar 13 09:42:05 2013 Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients] In addition, the "alert" directory was full of "log.xml" files. $ ls -ltr alert | tail -3 -rw-r----- 1 pythian pythian 10485849 Mar 10 14:43 log_293.xml -rw-r----- 1 pythian pythian 10485849 Mar 12 19:05 log_294.xml -rw-r----- 1 pythian pythian 2915094 Mar 13 09:36 log.xml $ ls -lt alert | tail -3 -rw-r----- 1 pythian pythian 10485849 Jul 16 2012 log_3.xml -rw-r----- 1 pythian pythian 10485849 Jul 15 2012 log_2.xml -rw-rw---- 1 pythian pythian 10485895 Jul 14 2012 log_1.xml Checking the contents of the most recent log.xml file also showed that log files were being written to and being filled up with messages of the missing directory. $ tail -40 log.xml host_addr='10.20.30.40'> Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients] Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients] Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients] Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients] Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients] Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients] Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients] In checking for directory, the first directory "/u01/app/oracle/product/11.2.0/db_1/log" did exist. The second directory "/u01/app/oracle/product/11.2.0/db_1/log/diag/clients" did not exist. $ ls -l /u01/app/oracle/product/11.2.0/db_1/log total 8 drwxr-xr-t 3 oracle oinstall 4096 May 10 2012 client_hostname drwxrwxr-x 3 oracle oinstall 4096 May 10 2012 diag $ ls -l /u01/app/oracle/product/11.2.0/db_1/log/diag/clients ls: /u01/app/oracle/product/11.2.0/db_1/log/diag/clients: No such file or directory $ cd /u01/app/oracle/product/11.2.0/db_1/log/diag $ ls -l total 8 -rw-r----- 1 oracle oinstall 16 Mar 6 14:32 adrci_dir.mif drwxr-x--- 2 oracle oinstall 4096 Nov 18 22:21 rdbms I decided to create the directory and see what happened. Nothing happened. And by "nothing", I mean "nothing" in a good way. Oracle stopped writing to the sqlnet.log file and the log.xml files to indicate the directory was missing. At the same time, nothing was created in the new clients directory. $ cd /u01/app/oracle/product/11.2.0/db_1/log/diag $ mkdir clients $ cd clients $ ls Seems like it was spending a lot of time complaining about a missing directory that it did not need to use. So why did Oracle create the "oradiag_pythian" directory and start complaining about the missing "client" directory? I was unable to find anything on metalink that would explain this behavior. Thankfully, I still had the FIRST log.xml file that was created, and there was some clue there as to why these files were created. $ head -40 log_1.xml Create Relation ADR_CONTROL Create Relation ADR_INVALIDATION Create Relation INC_METER_IMPT_DEF Create Relation INC_METER_PK_IMPTS Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients] Directory does not exist for read/write [/u01/app/oracle/product/11.2.0/db_1/log] [/u01/app/oracle/product/11.2.0/db_1/log/diag/clients] <msg org_id='oracle' /='' time='2012-06-18T13:41:01.773-04:00' type='UNKNOWN' level='16' host_id='10.20.30.40' Using the information in this first log file, I was able to find a similar incident that had occurred for someone else, and they had been able to track it down to Oracle bug 58917 (which I could not access on Metalink). In the related incident ( https://savannah.cern.ch/bugs/?58917 ), they noted that this had occurred following an ORA-24550.
For the record: the Oracle signal handler producing these ORA-24550 can probably be completed disabled (e.g. if one wants to use the ROOT signal handler) by setting DIAG_SIGHANDLER_ENABLED=FALSE.

No Comments Yet

Let us know what you think

Subscribe by email