Pythian Blog: Technical Track

Oracle Silent Mode, Part 6: Removing a Node From a 10.2 RAC

This sixth post describes how to remove a node from a 10.2 RAC cluster in silent mode. It differs from the associated documentation in that it will show how to remove a node, even if it has been made unavailable for any reason, including an error by a DBA or a SA.

Here is the complete series agenda:

  1. Installation of 10.2 And 11.1 Databases
  2. Patches of 10.2 And 11.1 databases
  3. Cloning Software and databases
  4. Install a 10.2 RAC Database
  5. Add a Node to a 10.2 RAC database
  6. Remove a Node from a 10.2 RAC database (this post!)
  7. Install a 11.1 RAC Database
  8. Add a Node to a 11.1 RAC database
  9. Remove a Node from a 11.1 RAC database
  10. A ton of other stuff you should know

Now for the substance of this part.

Backup the Voting Disk

A good way to start is usually to think about the worse thing that could happen and how to step back if you mess something up. There is probably less risk when you remove a node than when you add a new node. However, make sure you have a backup of the voting disk and the OCR. To proceed, run the dd and ocrsconfig commands as below:

rac-server5$ cd $ORA_CRS_HOME/bin
rac-server5$ ./crsctl query css votedisk
rac-server5$ mkdir -p /home/oracle/backup
rac-server5$ dd if=/dev/sdb5                        \
                of=/home/oracle/backup/votedisk.bak \
                bs=4k
rac-server5$ ./ocrconfig -showbackup

Remove the Services from the Instance You Plan to Delete

Using srvctl modify, update the services so that they don’t contain the instance you plan to remove:

rac-server5$ # The command below lists the services
rac-server5$ srvctl config service -d ORCL
rac-server5$ # The command below modify the OLTP service
rac-server5$ srvctl modify service -d ORCL \
                -s OLTP -n                 \
                -i "ORCL1,ORCL2,ORCL3,ORCL4"

Remove the Instance from the Node with DBCA

DBCA is probably the easiest and fastest way to delete an instance from a RAC database. If we assume you want to remove rac-server5 from the RAC configuration we’ve built in post 4 and post 5, you’ll start by running the command below from any of the servers:

rac-server5$ . oraenv
             ORCL 
rac-server5$ dbca -silent -deleteInstance \
         -gdbName ORCL                    \
         -instanceName ORCL5              \
         -sysDBAPassword xxx

You need to use the SYS password or at least to know the one from a SYSDBA user.

Manually Remove the Instance from the Node

In some situations, it’s useful to know how to manually delete an instance from a database. In that case, the steps to follow are these.

Step 1: Stop the instance you want to remove *

Use srvctl as below. In the case the server is gone, it’s very likely you don’t have to stop the instance.

srvctl stop instance -d ORCL -i ORCL5

Step 2: Remove the instance from the clusterware

Once the instance is stopped, you can delete it from the database’s list of instances by running the following command below from any one of the cluster nodes:

rac-server1$ srvctl remove instance -d ORCL -i ORCL5

Step 3: Remove the init.ora and password file *

Connect on the server you are removing and delete those two files:

rac-server5$ cd $ORACLE_HOME/dbs
rac-server5$ rm initORCL.ora
rac-server5$ rm orapwORCL

Step 4: Remove the parameters from the spfile

For the instance you’ve stopped, display the parameters that are set at the instance level with the query below on any of the remaining instances:

SQL> col name format a30
SQL> col value format a78
SQL> set lines 120
SQL> select name, value
       from v$spparameter
      where sid='ORCL5';

Reset all those parameters from the spfile as below:

SQL> alter system reset thread 
         scope=spfile sid='ORCL5';
SQL> alter system reset instance_number
         scope=spfile sid='ORCL5';
SQL> alter system local_listener
         scope=spfile sid='ORCL5';
SQL> alter system undo_tablespace
         scope=spfile sid='ORCL5';

Step 5: Remove the UNDO tablespace

From Step 4, you should have figure out what the UNDO tablespace from the instance was. You can check the tablespace and drop it:

SQL> drop tablespace UNDOTBS5
          including contents and datafiles;

Step 6: Drop the redo log thread

From Step 4, you should have figured out what the instance thread was. You can disable it and drop the associated redo log groups:

SQL> alter database disable thread 5;
SQL> select group# 
      from v$log 
     where thread#=5;
SQL> alter database drop logfile group 13;
SQL> alter database drop logfile group 14;
SQL> alter database drop logfile group 15;

Step 7: Change the TNS aliases

The easiest way to manage the network files is to have the exact same files on each one of the servers. The entries you’d want to have in the tnsnames.ora file are:

  • LISTENER_<server_name> for each one of the servers. These aliases point to the VIP end of the listener from each one of the servers and are used in the local_listener parameter of each instance.
  • LISTENERS_<gDbName> is an alias that points to all the listener VIPs, and is used by the remote_listener parameter.
  • <gDbName> is an alias that points to all the listener VIPs to connect to the database.
  • <Instance_Name> are aliases that point to the local listener and specify the instance_name parameter to force the connection to a specific instance.

Edit the tnsnames.ora files on all the nodes and remove the VIP alias from the ORCL and LISTENERS_ORCL aliases. Also remove the ORCL5 and LISTENER_ORCL5 aliases.

Step 8: Delete the administration directories *

Locate the various administration directories for the instance you are removing, and remove them from the server:

rac-server5$ cd /u01/app/oracle/admin
rac-server5$ rm -rf ORCL

Step 9: Remove the instance prefix *

Edit the oratab file and delete the entry for the RAC database on the node you are removing.

Remove ASM from the Node

None of the assistants will give you a hand in this, but deleting an ASM instance is straightforward. It consists in (1) stopping the ASM instance*; (2) removing the ASM instance from the OCR; and (3) deleting the ASM init.ora file*. Execute the commands below:

rac-server1$ srvctl stop asm -n rac-server5
rac-server1$ su -

rac-server1# srvctl remove asm -n rac-server5
rac-server1# exit
rac-server1$ ssh rac-server5
rac-server5$ rm $ORACLE_HOME/dba/init+ASM5.ora

Remove the Listener from the Node

The only supported way to remove the listener with 10g is to use NETCA. If the server is still part of the cluster, up and running, you can just run the command below:

export DISPLAY=:1
rac-server5$ netca /silent /deinst /nodeinfo rac-server5

If you don’t have all the nodes up and running, this command will fail, and the only way to remove the listener with 10.2, even if not supported, will be to run crs_unregister as below from one of the remaining nodes. (Does anybody want to comment that practice?):

rac-server1$ cd $ORA_CRS_HOME/bin
rac-server1$ ./crs_stat |grep lsnr
rac-server1$ ./crs_unregister \
       ora.rac-server5.LISTENER_RAC-SERVER5.lsnr

Be careful! It works with the listeners but it won’t with any other Oracle Resource. If you run that command and it fails for any reason, you’ll have to restore the OCR.

Remove the Database Software

By removing the software, there are two separate things to do: (1) update the Oracle Inventory on all the nodes that remain so that you remove all the software installer links to the server; (2) remove the software from that node*. This second operation is required only if the node is to be reused.

Update the inventory of the remaining nodes

Oracle Universal Installer (OUI) can do this from any of the remaining nodes. What you’ll declare in that specific case is that only rac-server1, rac-server2, rac-server3, and rac-server4 will still be part of the clustered installation. In order to update the inventories of these nodes, run:

rac-server1$ export ORACLE_HOME=/u01/app/oracle/product/10.2.0/db_1
rac-server1$ cd $ORACLE_HOME/oui/bin
rac-server1$ ./runInstaller -silent -updateNodeList \
             ORACLE_HOME=$ORACLE_HOME               \ 
             "CLUSTER_NODES={rac-server1,rac-server2,rac-server3,rac-server4}"

Once you’ve updated the inventory, OUI will never again prompt you for rac-server5 when used from any of these four nodes.

Delete the Database Software from the node to be removed *

One of the limits of the OUI with RAC is that you can not use it for one node only. It’s probably a good thing in a way, as you cannot apply a Patch Set on one node only and mess up everything. Unfortunately, when it happens that you want to remove the software from one server only, you have to work around that limit. The way to do it is to update the inventory of that node only and let it think it is the only node of the cluster. Once you’ve done that, you’ll be able to execute the OUI with various syntaxes such as detachHome or deinstall. To change the inventory on the node to be removed, connect to it and run:

rac-server5$ export ORACLE_HOME=/u01/app/oracle/product/10.2.0/db_1
rac-server5$ cd $ORACLE_HOME/oui/bin
rac-server5$ ./runInstaller -silent -updateNodeList \
             ORACLE_HOME=$ORACLE_HOME               \ 
             "CLUSTER_NODES={rac-server5}"          \
             -local

Once the inventory updated, all the runInstaller commands you give will touch only the local ORACLE_HOME (assuming it is not shared). You can then, as described in the documentation, remove the ORACLE_HOME you want and withdraw it from the cluster:

rac-server5$ cat /etc/oraInst.loc
rac-server5$ cd /u01/app/oraInventory/ContentsXML
rac-server5$ grep NAME inventory.xml
rac-server5$ export ORACLE_HOME=/u01/app/oracle/product/10.2.0/db_1
rac-server5$ cd $ORACLE_HOME/oui/bin
rac-server5$ .runInstaller -silent -deinstall -removeallfiles \
              "REMOVE_HOMES={/u01/app/oracle/product/10.2.0/db_1}"

You can also detach the ORACLE_HOME from the inventory with the -detachHome syntax as below:

rac-server5$ cat /etc/oraInst.loc
rac-server5$ cd /u01/app/oraInventory/ContentsXML
rac-server5$ grep NAME inventory.xml
rac-server5$ export ORACLE_HOME=/u01/app/oracle/product/10.2.0/db_1
rac-server5$ cd $ORACLE_HOME/oui/bin
rac-server5$ .runInstaller -silent -detachHome                  \
              ORACLE_HOME="/u01/app/oracle/product/10.2.0/db_1" \
              ORACLE_HOME_NAME="OraDB102Home1"

This second approach allow you to keep the ORACLE_HOME and delete its content only if or when you want:

rac-server5$ rm -rf /u01/app/oracle/product/10.2.0/db_1

If that’s the last database software installed on the server, you can delete the oratab file too:

rac-server5$ rm /etc/oratab

Remove the ONS Configuration

In order to remove the ONS subscription from the server, you can first query the ons.config file as below:

rac-server5$ cd $ORA_CRS_HOME/opmn/conf
rac-server5$ grep remoteport ons.config

If you cannot access the server because it is not available anymore, you can also dump the OCR with ocrdump and look at the content of the ONS config from that file. Once you know what port has to be deleted from the configuration, remove it from the cluster registry from any of the nodes:

rac-server1$ cd $ORA_CRS_HOME/bin
rac-server1$ ./racgons remove_config rac-server5:6200

Remove the NodeApps

The nodeapps includes the GSD, the ONS, and the VIP. You can simply remove them from any of the nodes with the srvctl remove nodeapps command:

rac-server1$ srvctl stop nodeapps -n rac-server5
rac-server1$ su -
rac-server1# ./srvctl remove nodeapps -n rac-server5

You can check that the nodeapps has been removed by querying their status with srvctl or by checking the resources named ora.rac-server5 as below:

rac-server1$ cd $ORA_CRS_HOME/bin
rac-server1$ ./crs_stat |grep "ora.rac-server5"

Remove the Clusterware Software

Removing the clusterware software is very similar to removing the database software–there are two separate things to do: (1) update the Oracle Inventory on all the nodes that remain; and (2) remove the clusterware from that node*. This second operation is required only if the node is to be reused.

Update the inventory of the remaining nodes

The syntax is identical to that of the database removal, except you have to add the CRS=TRUE directive as below:

rac-server1$ export ORA_CRS_HOME=/u01/app/crs
rac-server1$ cd $ORA_CRS_HOME/oui/bin
rac-server1$ ./runInstaller -silent -updateNodeList  \
     ORACLE_HOME=$ORA_CRS_HOME                       \ 
     "CLUSTER_NODES={rac-server1,rac-server2,rac-server3,rac-server4}" \
     CRS=TRUE

Once you’ve updated the inventory, OUI will never again prompt you for rac-server5 when used from any of those four nodes.

Delete the Clusterware Software from the node to be removed *

To delete the clusterware from the node you want to delete, you must first stop it, if you haven’t already:

rac-server5$ su -
rac-server5# cd /u01/app/crs/bin
rac-server5# ./crsctl stop crs
rac-server5# ./crsctl disable crs

Then, update the inventory as below:

rac-server5$ export ORA_CRS_HOME=/u01/app/crs
rac-server5$ cd $ORA_CRS_HOME/oui/bin
rac-server5$ ./runInstaller -silent -updateNodeList \
             ORACLE_HOME=$ORA_CRS_HOME              \ 
             "CLUSTER_NODES={rac-server5}"          \
             CRS=TRUE                               \
             -local

Finally, run the OUI with the deinstall and CRS=TRUE directives as below:

rac-server5$ cat /etc/oraInst.loc
rac-server5$ cd /u01/app/oraInventory/ContentsXML
rac-server5$ grep NAME inventory.xml
rac-server5$ export ORA_CRS_HOME=/u01/app/crs
rac-server5$ cd $ORA_CRS_HOME/oui/bin
rac-server5$ .runInstaller -silent -deinstall -removeallfiles \
              "REMOVE_HOMES={/u01/app/crs}"                   \
              CRS=TRUE

Additional cleanup *

If the node you are removing is still accessible, and you plan to reuse it (say, for another cluster) there is some additional cleanup to do:

  • delete any file that would remain in the Clusterware Home
  • delete any specific entries in the oracle .profile file
  • delete any specific entries in the crontab
  • delete the oraInv.loc file and the inventory
  • replace the inittab with the one backed up before the clusterware install: inittab.no_crs
  • delete the /var/tmp/.oracle directory
  • delete the Startup/Shutdown services, i.e., with Oracle or Redhat Enterprise Linux, all the /etc/init.d/init* and /etc/rc?.d/*init.crs files
  • delete the clusterware and ocr.loc files, i.e., with Oracle or Redhat Enterprise Linux, the /etc/oracle directory
  • delete the storage-specific configuration files to prevent altering the shared storage from that node (/etc/fstab for NFS, udev for ASM or the raw devices)

Remove the Node from the Cluster Configuration

Everything has been removed, but if you connect to any of the remaining nodes and run olsnodes, you’ll see the server is always registered in the OCR:

rac-server1$ cd /u01/app/crs/bin
rac-server1$ ./olsnodes -n -i
rac-server1    1    rac-server1-priv rac-server1-vip
rac-server2    2    rac-server2-priv rac-server2-vip
rac-server3    3    rac-server3-priv rac-server3-vip
rac-server4    4    rac-server4-priv rac-server4-vip
rac-server5    5    rac-server5-priv

In order to remove that server from the OCR, connect as root on any of the remaining nodes and use its name and number with the rootdeletenode.sh script as below:

rac-server1$ su -
rac-server1# cd /u01/app/crs/install
rac-server1# ./rootdeletenode.sh rac-server5,5
rac-server1# exit
rac-server1$ cd /u01/app/crs/bin
rac-server1$ olsnodes -n -i

More to come

This is it! In these three last posts, you’ve installed a 10.2 RAC, added one node, and removed one node without any X Display. Any comments for now? I hope you’ll agree that it’s pretty easy once you’re use to it. You’ll probably start (if it wasn’t the case before) to leverage RAC’s ability to scale up and down accordingly to your needs.

In the parts to follow, we’ll do the same with an 11.1 RAC.


* If you cannot access the server you are removing, don’t run this step. BACK

No Comments Yet

Let us know what you think

Subscribe by email