Pythian Blog: Technical Track

Troubleshooting failed database startup after GRID Out Of Place (OOP) rollback

GRID Out Of Place (OOP) patching completed successfully for 18.6.0.0.0. GRID_HOME=/u01/18.3.0.0/grid_2 ORACLE_HOME=/u01/app/oracle/12.1.0.1/db1 Here is an example of the inventory after patching. [code] + /u01/18.3.0.0/grid_2/OPatch/opatch lspatches 29302264;OCW RELEASE UPDATE 18.6.0.0.0 (29302264) 29301643;ACFS RELEASE UPDATE 18.6.0.0.0 (29301643) 29301631;Database Release Update : 18.6.0.0.190416 (29301631) 28547619;TOMCAT RELEASE UPDATE 18.0.0.0.0 (28547619) 28435192;DBWLM RELEASE UPDATE 18.0.0.0.0 (28435192) 27908644;UPDATE 18.3 DATABASE CLIENT JDK IN ORACLE HOME TO JDK8U171 27923415;OJVM RELEASE UPDATE: 18.3.0.0.180717 (27923415) + /u01/app/oracle/12.1.0.1/db1/OPatch/opatch lspatches 28731800;Database Bundle Patch : 12.1.0.2.190115 (28731800) 28729213;OCW PATCH SET UPDATE 12.1.0.2.190115 (28729213) [/code] Run cluvfy was successful, too. [code] [oracle@racnode-dc1-1 ~]$ cluvfy stage -post crsinst -n racnode-dc1-1,racnode-dc1-2 -verbose Post-check for cluster services setup was successful. CVU operation performed: stage -post crsinst Date: Apr 30, 2019 8:17:49 PM CVU home: /u01/18.3.0.0/grid_2/ User: oracle [oracle@racnode-dc1-1 ~]$ [/code] GRID OOP Rollback Patching completed successfully for node1. [code] [root@racnode-dc1-1 ~]# crsctl check cluster -all ************************************************************** racnode-dc1-1: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** racnode-dc1-2: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** [root@racnode-dc1-1 ~]# [root@racnode-dc1-1 ~]# echo $GRID_HOME /u01/18.3.0.0/grid_2 [root@racnode-dc1-1 ~]# $GRID_HOME/OPatch/opatchauto rollback -switch-clone -logLevel FINEST OPatchauto session is initiated at Fri May 3 01:06:47 2019 System initialization log file is /u01/18.3.0.0/grid_2/cfgtoollogs/opatchautodb/systemconfig2019-05-03_01-06-50AM.log. Session log file is /u01/18.3.0.0/grid_2/cfgtoollogs/opatchauto/opatchauto2019-05-03_01-08-00AM.log The id for this session is R47N Update nodelist in the inventory for oracle home /u01/18.3.0.0/grid. Update nodelist in the inventory is completed for oracle home /u01/18.3.0.0/grid. Bringing down CRS service on home /u01/18.3.0.0/grid CRS service brought down successfully on home /u01/18.3.0.0/grid Starting CRS service on home /u01/18.3.0.0/grid CRS service started successfully on home /u01/18.3.0.0/grid Confirm that all resources have been started from home /u01/18.3.0.0/grid. All resources have been started successfully from home /u01/18.3.0.0/grid. OPatchAuto successful. --------------------------------Summary-------------------------------- Out of place patching clone home(s) summary ____________________________________________ Host : racnode-dc1-1 Actual Home : /u01/18.3.0.0/grid_2 Version:18.0.0.0.0 Clone Home Path : /u01/18.3.0.0/grid Following homes are skipped during patching as patches are not applicable: /u01/app/oracle/12.1.0.1/db1 OPatchauto session completed at Fri May 3 01:14:25 2019 Time taken to complete the session 7 minutes, 38 seconds [root@racnode-dc1-1 ~]# crsctl check cluster -all ************************************************************** racnode-dc1-1: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** racnode-dc1-2: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** [root@racnode-dc1-1 ~]# /media/patch/findhomes.sh PID NAME ORACLE_HOME 10486 asm_pmon_+asm1 /u01/18.3.0.0/grid/ 10833 apx_pmon_+apx1 /u01/18.3.0.0/grid/ [root@racnode-dc1-1 ~]# cat /etc/oratab #Backup file is /u01/app/oracle/12.1.0.1/db1/srvm/admin/oratab.bak.racnode-dc1-1 line added by Agent #+ASM1:/u01/18.3.0.0/grid:N hawk1:/u01/app/oracle/12.1.0.1/db1:N hawk:/u01/app/oracle/12.1.0.1/db1:N # line added by Agent [root@racnode-dc1-1 ~]# [/code] GRID OOP Rollback Patching completed successfully for node2. [code] [root@racnode-dc1-2 ~]# crsctl check cluster -all ************************************************************** racnode-dc1-1: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** racnode-dc1-2: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** [root@racnode-dc1-2 ~]# [root@racnode-dc1-2 ~]# echo $GRID_HOME /u01/18.3.0.0/grid_2 [root@racnode-dc1-2 ~]# $GRID_HOME/OPatch/opatchauto rollback -switch-clone -logLevel FINEST OPatchauto session is initiated at Fri May 3 01:21:39 2019 System initialization log file is /u01/18.3.0.0/grid_2/cfgtoollogs/opatchautodb/systemconfig2019-05-03_01-21-41AM.log. Session log file is /u01/18.3.0.0/grid_2/cfgtoollogs/opatchauto/opatchauto2019-05-03_01-22-46AM.log The id for this session is 9RAT Update nodelist in the inventory for oracle home /u01/18.3.0.0/grid. Update nodelist in the inventory is completed for oracle home /u01/18.3.0.0/grid. Bringing down CRS service on home /u01/18.3.0.0/grid CRS service brought down successfully on home /u01/18.3.0.0/grid Starting CRS service on home /u01/18.3.0.0/grid CRS service started successfully on home /u01/18.3.0.0/grid Confirm that all resources have been started from home /u01/18.3.0.0/grid. All resources have been started successfully from home /u01/18.3.0.0/grid. OPatchAuto successful. --------------------------------Summary-------------------------------- Out of place patching clone home(s) summary ____________________________________________ Host : racnode-dc1-2 Actual Home : /u01/18.3.0.0/grid_2 Version:18.0.0.0.0 Clone Home Path : /u01/18.3.0.0/grid Following homes are skipped during patching as patches are not applicable: /u01/app/oracle/12.1.0.1/db1 OPatchauto session completed at Fri May 3 01:40:51 2019 Time taken to complete the session 19 minutes, 12 seconds [root@racnode-dc1-2 ~]# [/code] GRID OOP Rollback completed successfully for 18.5.0.0.0. GRID_HOME=/u01/18.3.0.0/grid ORACLE_HOME=/u01/app/oracle/12.1.0.1/db1 Here is an example of inventory after the rollback: [code] + /u01/18.3.0.0/grid/OPatch/opatch lspatches 28864607;ACFS RELEASE UPDATE 18.5.0.0.0 (28864607) 28864593;OCW RELEASE UPDATE 18.5.0.0.0 (28864593) 28822489;Database Release Update : 18.5.0.0.190115 (28822489) 28547619;TOMCAT RELEASE UPDATE 18.0.0.0.0 (28547619) 28435192;DBWLM RELEASE UPDATE 18.0.0.0.0 (28435192) 27908644;UPDATE 18.3 DATABASE CLIENT JDK IN ORACLE HOME TO JDK8U171 27923415;OJVM RELEASE UPDATE: 18.3.0.0.180717 (27923415) + /u01/app/oracle/12.1.0.1/db1/OPatch/opatch lspatches 28731800;Database Bundle Patch : 12.1.0.2.190115 (28731800) 28729213;OCW PATCH SET UPDATE 12.1.0.2.190115 (28729213) [/code] Validation shows database is OFFLINE, [code] + crsctl stat res -w '((TARGET != ONLINE) or (STATE != ONLINE)' -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.CRS.GHCHKPT.advm OFFLINE OFFLINE racnode-dc1-1 STABLE OFFLINE OFFLINE racnode-dc1-2 STABLE ora.crs.ghchkpt.acfs OFFLINE OFFLINE racnode-dc1-1 STABLE OFFLINE OFFLINE racnode-dc1-2 STABLE ora.helper OFFLINE OFFLINE racnode-dc1-1 STABLE OFFLINE OFFLINE racnode-dc1-2 IDLE,STABLE -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.hawk.db 1 ONLINE OFFLINE Instance Shutdown,STABLE 2 ONLINE OFFLINE Instance Shutdown,STABLE [/code] Start database FAILED. [code] [oracle@racnode-dc1-2 ~]$ . /media/patch/hawk.env The Oracle base has been set to /u01/app/oracle ORACLE_UNQNAME=hawk ORACLE_SID=hawk2 ORACLE_BASE=/u01/app/oracle ORACLE_HOME=/u01/app/oracle/12.1.0.1/db1 Oracle Instance not alive for sid "hawk2" [oracle@racnode-dc1-2 ~]$ srvctl status database -d $ORACLE_UNQNAME -v Instance hawk1 is not running on node racnode-dc1-1 Instance hawk2 is not running on node racnode-dc1-2 [oracle@racnode-dc1-2 ~]$ srvctl start database -d $ORACLE_UNQNAME PRCR-1079 : Failed to start resource ora.hawk.db CRS-5017: The resource action "ora.hawk.db start" encountered the following error: ORA-01078: failure in processing system parameters ORA-01565: error in identifying file '+DATA/hawk/spfilehawk.ora' ORA-17503: ksfdopn:10 Failed to open file +DATA/hawk/spfilehawk.ora ORA-27140: attach to post/wait facility failed ORA-27300: OS system dependent operation:invalid_egid failed with status: 1 ORA-27301: OS failure message: Operation not permitted ORA-27302: failure occurred at: skgpwinit6 ORA-27303: additional information: startup egid = 54321 (oinstall), current egid = 54322 (dba) . For details refer to "(:CLSN00107:)" in "/u01/app/oracle/diag/crs/racnode-dc1-1/crs/trace/crsd_oraagent_oracle.trc". CRS-2674: Start of 'ora.hawk.db' on 'racnode-dc1-1' failed CRS-2632: There are no more servers to try to place resource 'ora.hawk.db' on that would satisfy its placement policy CRS-5017: The resource action "ora.hawk.db start" encountered the following error: ORA-01078: failure in processing system parameters ORA-01565: error in identifying file '+DATA/hawk/spfilehawk.ora' ORA-17503: ksfdopn:10 Failed to open file +DATA/hawk/spfilehawk.ora ORA-27140: attach to post/wait facility failed ORA-27300: OS system dependent operation:invalid_egid failed with status: 1 ORA-27301: OS failure message: Operation not permitted ORA-27302: failure occurred at: skgpwinit6 ORA-27303: additional information: startup egid = 54321 (oinstall), current egid = 54322 (dba) . For details refer to "(:CLSN00107:)" in "/u01/app/oracle/diag/crs/racnode-dc1-2/crs/trace/crsd_oraagent_oracle.trc". CRS-2674: Start of 'ora.hawk.db' on 'racnode-dc1-2' failed [oracle@racnode-dc1-2 ~]$ [oracle@racnode-dc1-1 ~]$ . /media/patch/hawk.env The Oracle base has been set to /u01/app/oracle ORACLE_UNQNAME=hawk ORACLE_SID=hawk1 ORACLE_BASE=/u01/app/oracle ORACLE_HOME=/u01/app/oracle/12.1.0.1/db1 Oracle Instance not alive for sid "hawk1" [oracle@racnode-dc1-1 ~]$ srvctl start database -d hawk PRCR-1079 : Failed to start resource ora.hawk.db CRS-5017: The resource action "ora.hawk.db start" encountered the following error: ORA-01078: failure in processing system parameters ORA-01565: error in identifying file '+DATA/hawk/spfilehawk.ora' ORA-17503: ksfdopn:10 Failed to open file +DATA/hawk/spfilehawk.ora ORA-27140: attach to post/wait facility failed ORA-27300: OS system dependent operation:invalid_egid failed with status: 1 ORA-27301: OS failure message: Operation not permitted ORA-27302: failure occurred at: skgpwinit6 ORA-27303: additional information: startup egid = 54321 (oinstall), current egid = 54322 (dba) . For details refer to "(:CLSN00107:)" in "/u01/app/oracle/diag/crs/racnode-dc1-2/crs/trace/crsd_oraagent_oracle.trc". CRS-5017: The resource action "ora.hawk.db start" encountered the following error: ORA-01078: failure in processing system parameters ORA-01565: error in identifying file '+DATA/hawk/spfilehawk.ora' ORA-17503: ksfdopn:10 Failed to open file +DATA/hawk/spfilehawk.ora ORA-27140: attach to post/wait facility failed ORA-27300: OS system dependent operation:invalid_egid failed with status: 1 ORA-27301: OS failure message: Operation not permitted ORA-27302: failure occurred at: skgpwinit6 ORA-27303: additional information: startup egid = 54321 (oinstall), current egid = 54322 (dba) . For details refer to "(:CLSN00107:)" in "/u01/app/oracle/diag/crs/racnode-dc1-1/crs/trace/crsd_oraagent_oracle.trc". CRS-2674: Start of 'ora.hawk.db' on 'racnode-dc1-2' failed CRS-2632: There are no more servers to try to place resource 'ora.hawk.db' on that would satisfy its placement policy CRS-2674: Start of 'ora.hawk.db' on 'racnode-dc1-1' failed [oracle@racnode-dc1-1 ~]$ [/code] Incorrect permissions for the Oracle library was the cause. Changing permissions for $GRID_HOME/bin/oracle (chmod 6751 $GRID_HOME/bin/oracle), and stopping and starting CRS resolved the failure. [code] [oracle@racnode-dc1-1 dbs]$ ls -lhrt $ORACLE_HOME/bin/oracle -rwsr-s--x 1 oracle dba 314M Apr 20 16:06 /u01/app/oracle/12.1.0.1/db1/bin/oracle [oracle@racnode-dc1-1 dbs]$ ls -lhrt /u01/18.3.0.0/grid/bin/oracle -rwxr-x--x 1 oracle oinstall 396M Apr 20 19:21 /u01/18.3.0.0/grid/bin/oracle [oracle@racnode-dc1-1 dbs]$ cd /u01/18.3.0.0/grid/bin/ [oracle@racnode-dc1-1 bin]$ chmod 6751 oracle [oracle@racnode-dc1-1 bin]$ ls -lhrt /u01/18.3.0.0/grid/bin/oracle -rwsr-s--x 1 oracle oinstall 396M Apr 20 19:21 /u01/18.3.0.0/grid/bin/oracle [root@racnode-dc1-1 ~]# . /media/patch/gi.env The Oracle base has been set to /u01/app/oracle ORACLE_SID=+ASM1 ORACLE_BASE=/u01/app/oracle GRID_HOME=/u01/18.3.0.0/grid ORACLE_HOME=/u01/18.3.0.0/grid Oracle Instance alive for sid "+ASM1" [root@racnode-dc1-1 ~]# crsctl stop crs ==================================================================================================== [root@racnode-dc1-2 ~]# . /media/patch/gi.env The Oracle base has been set to /u01/app/oracle ORACLE_SID=+ASM2 ORACLE_BASE=/u01/app/oracle GRID_HOME=/u01/18.3.0.0/grid ORACLE_HOME=/u01/18.3.0.0/grid Oracle Instance alive for sid "+ASM2" [root@racnode-dc1-2 ~]# ls -lhrt $GRID_HOME/bin/oracle -rwxr-x--x 1 oracle oinstall 396M Apr 21 01:44 /u01/18.3.0.0/grid/bin/oracle [root@racnode-dc1-2 ~]# chmod 6751 $GRID_HOME/bin/oracle [root@racnode-dc1-2 ~]# ls -lhrt $GRID_HOME/bin/oracle -rwsr-s--x 1 oracle oinstall 396M Apr 21 01:44 /u01/18.3.0.0/grid/bin/oracle [root@racnode-dc1-2 ~]# crsctl stop crs ==================================================================================================== [root@racnode-dc1-2 ~]# crsctl start crs [root@racnode-dc1-1 ~]# crsctl start crs [/code] Reference: RAC Database Can't Start: ORA-01565, ORA-17503: ksfdopn:10 Failed to open file +DATA/BPBL/spfileBPBL.ora (Doc ID 2316088.1) In conclusion, while the GRID rollback may have completed successfully, the database was down and resulted in an outage. It may be prudent to check permissions for the Oracle library before a patching or rollback to avoid any disasters.

No Comments Yet

Let us know what you think

Subscribe by email