Pythian Blog: Technical Track

Troubleshooting Oracle's Auto Service Request

I've spent the better part of the day troubleshooting an issue with Oracle's Auto Service Request (ASR) and wanted to share my results in case if saves someone else some effort. The ASR manager is designed to be a side-wide aggregation point for ASR alerts, receiving SNMP traps and forwarding over https to transport.oracle.com. But if you're using port 162 for SNMP traps on a Linux system, you may find that such traps are never sent to Oracle. I was testing this by creating test traps through IPMI: [code] # ipmitool sunoem cli "set /SP/alertmgmt/rules/1 testrule=true" Connected. Use ^D to exit. -> set /SP/alertmgmt/rules/1 testrule=true Set 'testrule' to 'true' -> Session closed Disconnected [/code] This command should be passed onto Oracle and result in an e-mail noting a test service request had been created. But in my case, nothing came up. /var/log/messages however did show a test trap generated: [code] Dec 19 16:12:23 asrmgr01 snmptrapd[14527]: 2013-12-19 16:12:23 testdb01.example.com [UDP: [43.218.200.118]:32957]: DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (51161892) 5 days, 22:06:58.92 SNMPv2-MIB::snmpTrapOID.0 = OID: SNMPv2-SMI::enterprises.42.2.175.103.2.0.63 SNMPv2-SMI::enterprises.42.2.175.103.2.1.1.0 = STRING: "Oracle Database Appliance X3-2 1234ABC12B" SNMPv2-SMI::enterprises.42.2.175.103.2.1.14.0 = STRING: "1234ABC12B" SNMPv2-SMI::enterprises.42.2.175.103.2.1.15.0 = STRING: "SUN FIRE X4170 M3" SNMPv2-SMI::enterprises.42.2.175.103.2.1.20.0 = STRING: "This is a test trap" [/code] But none of the ASR manager logs in /var/opt/SUNWsasm/log showed any indication of activity. After a lot of digging, including copious logfile reading, straces, and tcpdumps, I found that the ASR manager process is not even listening for SNMP traps: [code] [root@asrmgr01 log]# lsof -p `pidof java` | grep UDP java 31318 root 93u IPv6 23334618 0t0 UDP *:41178 [/code] Searching for who's holding the SNMP port 162, "snmptrap" [code] [root@asrmgr01 log]# lsof | grep UDP | grep ":snmptrap" snmptrapd 28163 root 8u IPv4 23357406 0t0 UDP *:snmptrap [/code] It's another complete process, snmptrapd. [code] [root@asrmgr01 log]# ps -ef | grep snmptrapd | grep -v grep root 4986 1 0 Dec15 ? 00:00:04 /usr/sbin/snmptrapd -Lsd -p /var/run/snmptrapd.pid [/code] Decoding the arguments from the command line, -Lsd sends "L"og messages to "s"yslog at "d"aemon priority. And it was these messages I had seen in /var/log/messages. And a little more diffing in the ASR manager lgofile /var/opt/SUNWsasm/log/sasm.log does show a telling message: [code] 2013-12-19_16:00:51 command executed: sasm start-instance Starting Oracle Automated Service Manager... Cannot bind to port : 162 [/code] Unfortunately sasm continued to start, not reporting anything in stdout. It would have been much easier if it would have simply exited on a fatal error like this. Anyways, the fix was quite simple: disabling snmptrapd on the ASR manager host: [code] chkconfig snmptrapd off service snmptrapd stop service sasm restart [/code] And then my test traps start succeeding in generating e-mail alerts.

No Comments Yet

Let us know what you think

Subscribe by email