Pythian Blog: Technical Track

How I Finished a GI OOP Patching From 19.6 to 19.8 After Facing cluutil: No Such File or Directory and clsrsc-740 Errors

This past weekend I was doing a production grid infrastructure (GI) out-of-place patching (OOP) from 19.6 to 19.8 for a client. While doing this exercise, I hit a couple of bugs (bugs 20785766 and 27554103).

This post explains how I solved them. I hope it saves you a lot of time if you ever face these issues.

As I have already blogged in the past on how to do a GI OOP, I won't go into all the details of this process. I will just address those relevant to today's post.

I did the switchGridHome from 19.6 to 19.8 without any issues and successfully ran root.sh in node1.

[grid@hostname1 grid]$ ./gridSetup.sh -switchGridHome -silent
 Launching Oracle Grid Infrastructure Setup Wizard...
 
 You can find the log of this install session at:
  /u01/app/oraInventory/logs/cloneActions2020-11-20_09-10-17PM.log
 
 
 As a root user, execute the following script(s):
  1. /u01/app/19.8.0.0/grid/root.sh
 
 Execute /u01/app/19.8.0.0/grid/root.sh on the following nodes:
 [hostname1, hostname2]
 
 Run the scripts on the local node first. After successful completion, run the scripts in sequence on all other nodes.
 
 Successfully Setup Software.
 ...
 [root@hostname1 ~]# /u01/app/19.8.0.0/grid/root.sh
 Check /u01/app/19.8.0.0/grid/install/root_oracle-db01-s01_2020-11-20_21-13-24-032842094.log for the output of root script
 

When I ran the root.sh in node2, I ran into the error The CRS executable file 'clsecho' does not exist. I went and checked, and indeed the file didn't exist in GI_HOME/bin. Doing a check between node1 and node2, there was a difference of about 100 files for this directory.

[root@hostname2 ~]$ /u01/app/19.8.0.0/grid/root.sh
 Check /u01/app/19.8.0.0/grid/install/root_hostname2_2020-11-21_03-53-47-360707303.log for the output of root script
 
 [root@hostname2 ~]$ tail /u01/app/19.8.0.0/grid/install/root_hostname2_2020-11-21_03-53-47-360707303.log
 2020-11-20 21:42:27: The 'ROOTCRS_PREPATCH' is either in START/FAILED state
 2020-11-20 21:42:27: The CRS executable file /u01/app/19.8.0.0/grid/bin/cluutil either does not exist or is not executable
 2020-11-20 21:42:27: Invoking "/u01/app/19.8.0.0/grid/bin/cluutil -ckpt -oraclebase /u01/app/oracle -chkckpt -name ROOTCRS_PREPATCH -status"
 2020-11-20 21:42:27: trace file=/u01/app/oracle/crsdata/hostname2/crsconfig/cluutil3.log
 2020-11-20 21:42:27: Running as user grid: /u01/app/19.8.0.0/grid/bin/cluutil -ckpt -oraclebase /u01/app/oracle -chkckpt -name ROOTCRS_PREPATCH -status
 2020-11-20 21:42:27: Removing file /tmp/X9bxqSWx3c
 2020-11-20 21:42:27: Successfully removed file: /tmp/X9bxqSWx3c
 2020-11-20 21:42:27: pipe exit code: 32512
 2020-11-20 21:42:27: /bin/su exited with rc=127
 
 2020-11-20 21:42:27: bash: /u01/app/19.8.0.0/grid/bin/cluutil: No such file or directory
 
 2020-11-20 21:42:27: The CRS executable file /u01/app/19.8.0.0/grid/bin/clsecho either does not exist or is not executable
 2020-11-20 21:42:27: The CRS executable file 'clsecho' does not exist.
 2020-11-20 21:42:27: ###### Begin DIE Stack Trace ######
 2020-11-20 21:42:27: Package File Line Calling
 2020-11-20 21:42:27: --------------- -------------------- ---- ----------
 2020-11-20 21:42:27: 1: main rootcrs.pl 357 crsutils::dietrap
 2020-11-20 21:42:27: 2: crspatch crspatch.pm 2815 main::__ANON__
 2020-11-20 21:42:27: 3: crspatch crspatch.pm 2203 crspatch::postPatchRerunCheck
 2020-11-20 21:42:27: 4: crspatch crspatch.pm 2015 crspatch::crsPostPatchCkpts
 2020-11-20 21:42:27: 5: crspatch crspatch.pm 394 crspatch::crsPostPatch
 2020-11-20 21:42:27: 6: main rootcrs.pl 370 crspatch::new
 2020-11-20 21:42:27: ####### End DIE Stack Trace #######
 
 2020-11-20 21:42:27: checkpoint has failed
 
 ########################################################################
 ## Difference of Number of files between node1 and node2
 ########################################################################
 [root@hostname1 ~]$ ls -ltr /u01/app/19.8.0.0/grid/bin | wc -l
 405
 [root@hostname2 ~]$ ls -ltr /u01/app/19.8.0.0/grid/bin | wc -l
 303

The first thing I did after the failure, was to check the status of the cluster with Fred Dennis's script rac_status. I found that everything was up and the crs status was in ROLLING PATCH mode. The crs was running with the 19.8 version in node1, and with the 19.6 version in node2 .

[grid@hostname1 antunez]$ ./rac_status.sh -a
 
  Cluster rene-ace-cluster
 
  Type | Name | hostname1 | hostname2 |
  ---------------------------------------------------------------------------
  MGMTLSNR | MGMTLSNR | Online | - |
  asm | asm | Online | Online |
  asmnetwork | asmnet1 | Online | Online |
  chad | chad | Online | Online |
  cvu | cvu | - | Online |
  dg | ORAARCH | Online | Online |
  dg | ORACRS | Online | Online |
  dg | ORADATA | Online | Online |
  dg | ORAFLASHBACK | Online | Online |
  dg | ORAREDO | Online | Online |
  network | net1 | Online | Online |
  ons | ons | Online | Online |
  qosmserver | qosmserver | - | Online |
  vip | hostname1 | Online | - |
  vip | hostname2 | - | Online |
  vip | scan1 | Online | - |
  vip | scan2 | - | Online |
  vip | scan3 | - | Online |
  ---------------------------------------------------------------------------
  x : Resource is disabled
  : Has been restarted less than 24 hours ago
  : STATUS and TARGET are different
 
  Listener | Port | hostname1 | hostname2 | Type |
  ------------------------------------------------------------------------------------------
  ASMNET1LSNR_ASM| TCP:1526 | Online | Online | Listener |
  LISTENER | TCP:1521,1525 | Online | Online | Listener |
  LISTENER_SCAN1 | TCP:1521,1525 | Online | - | SCAN |
  LISTENER_SCAN2 | TCP:1521,1525 | - | Online | SCAN |
  LISTENER_SCAN3 | TCP:1521,1525 | - | Online | SCAN |
  ------------------------------------------------------------------------------------------
  : Has been restarted less than 24 hours ago
 
  DB | Version | hostname1 | hostname2 | DB Type |
  ------------------------------------------------------------------------------------------
  mgm | (2) | Open | - | MGMTDB (P) |
  prod | 12.1.0 (1) | Open | Open | RAC (P) |
  ------------------------------------------------------------------------------------------
  ORACLE_HOME references listed in the Version column ("''" means "same as above")
 
  1 : /u01/app/oracle/product/12.1.0/db_1 oracle oinstall
  2 : %CRS_HOME% grid ''
 
  : Has been restarted less than 24 hours ago
  : STATUS and TARGET are different
 
 [grid@hostname1 antunez]$ crsctl query crs activeversion -f
 Oracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [ROLLING PATCH]. The cluster active patch level is [2701864972].

I found MOS note Grid Infrastructure root script (root.sh etc) fails as remote node missing binaries (Doc ID 1991928.1) It explains there's a bug (20785766) in the GI installer in 12.1 for files missing in the GI_HOME/bin and/or GI_HOME/lib. Even though the document mentions 12.1, I hit it with the 19.8 version. It applied to my issue, so I did what the note says which is:

"... the workaround is to manually copy missing files from the node where installer was started and re-run root script."

I excluded the soft link lbuilder as that was already created in the second node. I also changed ownership of root:oinstall to the GI_HOME/bin files in node2.

########################################################################
 ## From node2
 ########################################################################
 [root@hostname2 bin]# ls -al | grep "lbuilder"
 lrwxrwxrwx. 1 grid oinstall 24 Nov 20 21:10 lbuilder -> ../nls/lbuilder/lbuilder
 
 ########################################################################
 ## From node1
 ########################################################################
 [root@hostname1 ~]$ cd /u01/app/19.8.0.0/grid/bin 
 [root@hostname1 ~]$ find . ! -name "lbuilder" | xargs -i scp {} hostname2:/u01/app/19.8.0.0/grid/bin
 
 ########################################################################
 ## Difference of Number of files between node1 and node2
 ########################################################################
 [root@hostname1 ~]$ ls -ltr /u01/app/19.8.0.0/grid/bin | wc -l
 405
 [root@hostname2 ~]$ ls -ltr /u01/app/19.8.0.0/grid/bin | wc -l
 405
 
 ########################################################################
 ## Changed the ownership to root:oinstall in hostname2
 ########################################################################
 [root@hostname2 ~]$ cd /u01/app/19.8.0.0/grid/bin 
 [root@hostname2 bin]$ chown root:oinstall ./*

Now that I had copied the files, I did a relink of the GI_HOME in node2, using this documentation note, as the sticky bits were lost with the scp.

A few notes on the relink in this situation:

  1. As the active GI binaries in node2 were still from the 19.6 GI_HOME, I didn't need to run rootcrs.sh -unlock.
  2. I didn't run rootadd_rdbms.sh, as this runs as part of the /u01/app/19.8.0.0/grid/root.sh which I was going to rerun after the fix above.
  3. Similar to point 1, I didn't run rootcrs.sh -lock.
[grid@hostname2 ~]$ export ORACLE_HOME=/u01/app/19.8.0.0/grid
 [grid@hostname2 ~]$ $ORACLE_HOME/bin/relink

After the relink, I reran the /u01/app/19.8.0.0/grid/root.sh in node2. This time I received a new error — CLSRSC-740: inconsistent options specified to the postpatch command.

[root@hostname2 ~]$ /u01/app/19.8.0.0/grid/root.sh
 Check /u01/app/19.8.0.0/grid/install/crs_postpatch_hostname2_2020-11-20_11-39-26PM.log for the output of root script
 
 [root@hostname2 ~]$ tail /u01/app/19.8.0.0/grid/install/crs_postpatch_hostname2_2020-11-20_11-39-26PM.log
 
 2020-11-20 23:39:28: NONROLLING=0
 
 2020-11-20 23:39:28: Succeeded to get property value:NONROLLING=0
 
 2020-11-20 23:39:28: Executing cmd: /u01/app/19.8.0.0/grid/bin/clsecho -p has -f clsrsc -m 740
 2020-11-20 23:39:28: Executing cmd: /u01/app/19.8.0.0/grid/bin/clsecho -p has -f clsrsc -m 740
 2020-11-20 23:39:28: Command output:
 > CLSRSC-740: inconsistent options specified to the postpatch command
 >End Command output
 2020-11-20 23:39:28: CLSRSC-740: inconsistent options specified to the postpatch command
 2020-11-20 23:39:28: ###### Begin DIE Stack Trace ######
 2020-11-20 23:39:28: Package File Line Calling
 2020-11-20 23:39:28: --------------- -------------------- ---- ----------
 2020-11-20 23:39:28: 1: main rootcrs.pl 357 crsutils::dietrap
 2020-11-20 23:39:28: 2: crspatch crspatch.pm 2212 main::__ANON__
 2020-11-20 23:39:28: 3: crspatch crspatch.pm 2015 crspatch::crsPostPatchCkpts
 2020-11-20 23:39:28: 4: crspatch crspatch.pm 394 crspatch::crsPostPatch
 2020-11-20 23:39:28: 5: main rootcrs.pl 370 crspatch::new
 2020-11-20 23:39:28: ####### End DIE Stack Trace #######
 
 2020-11-20 23:39:28: checkpoint has failed

After investigation I saw that the checkpoint ROOTCRS_PREPATCH status was marked as successful from the previous failed run of the root.sh command.

[grid@hostname2 ~]$ /u01/app/19.8.0.0/grid/bin/cluutil -ckpt -oraclebase /u01/app/oracle -chkckpt -name ROOTCRS_PREPATCH -status
 SUCCESS

Continuing to investigate showed that this error was part of bug 27554103. I solved this error by changing the checkpoint ROOTCRS_PREPATCH to the status "start" and rerunning the /u01/app/19.8.0.0/grid/root.sh in node2.

[root@hostname2 ~]# /u01/app/19.8.0.0/grid/bin/cluutil -ckpt -oraclebase /u01/app/oracle -writeckpt -name ROOTCRS_PREPATCH -state START
 
 [root@hostname2 ~]# /u01/app/19.8.0.0/grid/bin/cluutil -ckpt -oraclebase /u01/app/oracle -chkckpt -name ROOTCRS_PREPATCH -status
 START
 
 [root@hostname2 ~]# /u01/app/19.8.0.0/grid/root.sh
 Check /u01/app/19.8.0.0/grid/install/root_hostname2_2020-11-21_03-53-47-360707303.log for the output of root script

After completing the steps above, I saw everything was now as it should be in both nodes and the cluster upgrade state was in NORMAL state.

[grid@hostname1 antunez]$ ./rac_status.sh -a
 
  Cluster rene-ace-cluster
 
  Type | Name | hostname1 | hostname2 |
  ---------------------------------------------------------------------------
  MGMTLSNR | MGMTLSNR | Online | - |
  asm | asm | Online | Online |
  asmnetwork | asmnet1 | Online | Online |
  chad | chad | Online | Online |
  cvu | cvu | - | Online |
  dg | ORAARCH | Online | Online |
  dg | ORACRS | Online | Online |
  dg | ORADATA | Online | Online |
  dg | ORAFLASHBACK | Online | Online |
  dg | ORAREDO | Online | Online |
  network | net1 | Online | Online |
  ons | ons | Online | Online |
  qosmserver | qosmserver | - | Online |
  vip | hostname1 | Online | - |
  vip | hostname2 | - | Online |
  vip | scan1 | Online | - |
  vip | scan2 | - | Online |
  vip | scan3 | - | Online |
  ---------------------------------------------------------------------------
  x : Resource is disabled
  : Has been restarted less than 24 hours ago
  : STATUS and TARGET are different
 
  Listener | Port | hostname1 | hostname2 | Type |
  ------------------------------------------------------------------------------------------
  ASMNET1LSNR_ASM| TCP:1526 | Online | Online | Listener |
  LISTENER | TCP:1521,1525 | Online | Online | Listener |
  LISTENER_SCAN1 | TCP:1521,1525 | Online | - | SCAN |
  LISTENER_SCAN2 | TCP:1521,1525 | - | Online | SCAN |
  LISTENER_SCAN3 | TCP:1521,1525 | - | Online | SCAN |
  ------------------------------------------------------------------------------------------
  : Has been restarted less than 24 hours ago
 
  DB | Version | hostname1 | hostname2 | DB Type |
  ------------------------------------------------------------------------------------------
  mgm | (2) | Open | - | MGMTDB (P) |
  prod | 12.1.0 (1) | Open | Open | RAC (P) |
  ------------------------------------------------------------------------------------------
  ORACLE_HOME references listed in the Version column ("''" means "same as above")
 
  1 : /u01/app/oracle/product/12.1.0/db_1 oracle oinstall
  2 : %CRS_HOME% grid ''
 
  : Has been restarted less than 24 hours ago
  : STATUS and TARGET are different
 
 [grid@hostname1 antunez]$ crsctl query crs activeversion -f
 Oracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [441346801].

Hopefully this blog post saves you from a few headaches and working long hours overnight if you ever hit these two bugs while doing an OOP for your 19.x GI.

Note: This was originally published on rene-ace.com.

No Comments Yet

Let us know what you think

Subscribe by email