Pythian Blog: Technical Track

Oracle block corruptions, DBV vs RMAN

Over-the-Top Tales from the Trenches.
Motto: Bringing order to the chaos of every day DBA life.

Dear Diary,

Is that a whiff of decay in the air, the pungent odour of corruption? No… not me, the database.

Oracle prides itself on the robust nature of its database. Kill it off, or have a sudden power outage, and 99.9% of the time the db will pick itself up, clean off the dust of uncommitted transactions, and get back into the game.

There is nothing more nasty or dangerous to any database than a filesystem/disk with IO issues. To paraphase the bard, “Is the data to be or not to be? that is the question”. Did the block(s) get written on disk or did something else happen? Or did you get fries with that?

As a DBA, you are paid to wear the tin-foil hat occasionally to protect yourself from the soothing signals transmitted by disk caches and controllers, informing you that everything is ok… go back to sleep… your data is ok…

Is there an easy way to check for block corruptions and also logical corruptions (e.g., dodgy indexes) after the hint of filesystem IO issues?

The first thing that comes to mind is DBV. So you have 250+ datafiles. You hunt for a script to check each one, then trawl through the mountain of data DBV produces for each datafile.

On Unix you could run a shell script like:

#!/bin/bash
BLOCKSIZE=$1
DATADIR=$2
cd $DATADIR
ls -1 *.dbf | while read FILE
do
    dbv file=$FILE blocksize=$BLOCKSIZE
done

Call the shell script like this:

./dbv.sh 8192 /oracle/oradata/$ORACLE_SID >> dbv.log 2>&1

While that dbv.sh will work on most recent and no-so-recent versions of Oracle, there is another way, one which will populate a table in the db for you if there are any corruptions, and give you a reasonable ETA on the total amount of time required. This way is The Tao of RMAN.

RMAN is able to check the database for block and logical corruptions using a nice command called
BACKUP VALIDATE CHECK LOGICAL DATABASE.

In this example command file, RMAN uses 4 channels to speed up the process and potentially thrash your disk. A good option would be to use iostat -x 5 5, as described by Alex in this article, to determine when the disks are saturated, and modify the command file accordingly.

You could also use the RATE option to limit the disk rate like this, ALLOCATE CHANNEL c1 DEVICE TYPE DISK RATE=1500K;

Here is the command file:

run {
allocate channel c1 device type disk ;
allocate channel c2 device type disk ;
allocate channel c3 device type disk ;
allocate channel c4 device type disk ;
backup validate check logical database;
}

To run the command file and have the rman output go to a logfile, call it like this:

rman target / cmdfile rman_check_corrupt.cmd log rman_check_corrupt.log 2>&1 &

While RMAN does its thing, you can jump into the database and check how long Oracle estimates the long-running process will take, using this SQL:

select sid,start_time,elapsed_seconds,time_remaining, round(time_remaining/60,2) "Min",message from v$session_longops where time_remaining > 0
/

As RMAN proceeds, you can perform some stationary exercise at your desk, increase your blood pressure and heart rate by running

select count(*) from v$database_block_corruption;

So was it six corruptions or only five, are you feeling lucky?

The best part of RMAN is that is will run on primary and standby databases equally effectively. However, neither the mightly dbv nor even RMAN will help you with the insidous data guard corruption bug.

Have Fun!
Paul

Related posts in the series:

Creating standby databases using RMAN duplicate

The mysterious world of shmmax and shmall

No Comments Yet

Let us know what you think

Subscribe by email