Wednesday, April 21, 2010

Play with my toy 10g RAC IV - Test two OCR adminstration tasks

The OCR contains information about the cluster node list, instance-to-node mapping information, and information about Oracle Clusterware resource profiles for applications that may have been customized

Task 1 - Mirror the OCR

Oracle RAC environments do not support more than two OCRs, at most a primary OCR and a second OCR. As my OCR is on the OCFS2 filesystem, I need to first create a new OCR file to complete the task of mirroring the OCR

1). Verify I don't have ocrmirror:

root@rac1:~ [devdb1]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     262144
         Used space (kbytes)      :       5348
         Available space (kbytes) :     256796
         ID                       :  645781380
         Device/File Name         : /ocfs/clusterware/ocr
                                    Device/File integrity check succeeded

                                    Device/File not configured

         Cluster registry integrity check succeeded

2) Create a new OCR file:

dd if=/dev/zero of=/ocfs/clusterware/ocrmirror.dbf bs=1M count=128

root@rac1:/ocfs/clusterware [devdb1]# dd if=/dev/zero of=/ocfs/clusterware/ocrmirror.dbf bs=1M count=128
128+0 records in
128+0 records out
root@rac1:/ocfs/clusterware [devdb1]# ls -lh
total 144M
-rw-r-----  1 root   oinstall 5.5M Apr 15 10:15 ocr
-rw-r--r--  1 root   root     128M Apr 20 16:10 ocrmirror.dbf
-rw-r--r--  1 oracle oinstall 9.8M Apr 20 16:10 votingdisk


3) Add the OCRMIRROR

root@rac1:/ocfs/clusterware [devdb1]# ocrconfig -replace ocrmirror /ocfs/clusterware/ocrmirror.dbf

root@rac1:/ocfs/clusterware [devdb1]# ls -lhtr
total 397M
-rw-r-----  1 root   oinstall 5.5M Apr 15 10:15 ocr
-rw-r--r--  1 root   root     381M Apr 20 16:13 ocrmirror.dbf
-rw-r--r--  1 oracle oinstall 9.8M Apr 20 16:14 votingdisk


Note: the size of ocrmirror.dbf becomes 381M, this is quite unusal.


4) Verify:

root@rac1:/ocfs/clusterware [devdb1]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     262144
         Used space (kbytes)      :       5348
         Available space (kbytes) :     256796
         ID                       :  645781380
         Device/File Name         : /ocfs/clusterware/ocr
                                    Device/File integrity check succeeded
         Device/File Name         : /ocfs/clusterware/ocrmirror.dbf
                                    Device/File integrity check succeeded

         Cluster registry integrity check succeeded


5) Remove OCRMRROR if we want

root@rac1:/ocfs/clusterware [devdb1]# ocrconfig -replace ocrmirror
root@rac1:/ocfs/clusterware [devdb1]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     262144
         Used space (kbytes)      :       5348
         Available space (kbytes) :     256796
         ID                       :  645781380
         Device/File Name         : /ocfs/clusterware/ocr
                                    Device/File integrity check succeeded

                                    Device/File not configured

         Cluster registry integrity check succeeded

root@rac1:/ocfs/clusterware [devdb1]# rm ocrmirror.dbf
rm: remove regular file `ocrmirror.dbf'? yes
root@rac1:/ocfs/clusterware [devdb1]#
root@rac1:/ocfs/clusterware [devdb1]# df -h .
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdb1             512M   90M  423M  18% /ocfs




Task 2 - Backup and restore an OCR file

1) For security purpose, generate a logical OCR backup file

[oracle@rac1 ~]$ sudo /u01/app/oracle/product/10.2.0/crs_1/bin/ocrconfig -export ./backup/logicalocrbak
Password:
[oracle@rac1 ~]$ ls -lh ./backup/
total 128K
-rw-r--r--  1 root root 122K Apr 21 15:16 logicalocrbak


2) Delete the OCR file to simulate lost of OCR

root@rac1:/ocfs/clusterware [devdb1]# ls -lhtr
total 138M
-rw-r--r--  1 root   root     128M Apr 21 15:17 ocr
-rw-r--r--  1 oracle oinstall 9.8M Apr 21 15:17 votingdisk
root@rac1:/ocfs/clusterware [devdb1]# rm -f ocr
root@rac1:/ocfs/clusterware [devdb1]# ls -lhtr
total 9.8M
-rw-r--r--  1 oracle oinstall 9.8M Apr 21 15:17 votingdisk

[oracle@rac1 ~]$ ocrcheck
PROT-602: Failed to retrieve data from the cluster registry


3) Locate a physical backup of OCR

[oracle@rac1 ~]$ ocrconfig -showbackup

rac2     2010/04/20 17:58:54     /u01/app/oracle/product/10.2.0/crs_1/cdata/crs

rac2     2010/04/19 19:47:53     /u01/app/oracle/product/10.2.0/crs_1/cdata/crs

rac2     2010/04/19 15:47:51     /u01/app/oracle/product/10.2.0/crs_1/cdata/crs

rac2     2010/04/19 11:47:47     /u01/app/oracle/product/10.2.0/crs_1/cdata/crs

rac2     2010/04/19 15:47:51     /u01/app/oracle/product/10.2.0/crs_1/cdata/crs
[oracle@rac1 ~]$ ssh rac2 ls -lhtr /u01/app/oracle/product/10.2.0/crs_1/cdata/crs
total 37M
-rw-r--r--  1 root root 4.6M Apr 12 21:14 week_.ocr
-rw-r--r--  1 root root 5.4M Apr 19 11:47 day.ocr
-rw-r--r--  1 root root 5.4M Apr 19 15:47 backup02.ocr
-rw-r--r--  1 root root 5.4M Apr 19 15:47 week.ocr
-rw-r--r--  1 root root 5.4M Apr 19 19:47 backup01.ocr
-rw-r--r--  1 root root 5.4M Apr 20 17:58 backup00.ocr
-rw-r--r--  1 root root 5.4M Apr 20 17:58 day_.ocr


Note: The Oracle Clusterware automatically creates OCR backups every four hours. Default location: CRS_home/cdata/cluster_name

4) Stop CRS resources on both nodes

[oracle@rac1 ~]$ crs_stop -all
Attempting to stop `ora.devdb.SNOLBA.devdb1.srv` on member `rac1`
Attempting to stop `ora.devdb.SLBA.devdb1.srv` on member `rac1`
Attempting to stop `ora.devdb.SNOLBA.cs` on member `rac1`
Attempting to stop `ora.rac1.gsd` on member `rac1`
Attempting to stop `ora.devdb.SLBA.cs` on member `rac1`
Attempting to stop `ora.devdb.SLBA.devdb2.srv` on member `rac2`
Attempting to stop `ora.devdb.SNOLBA.devdb2.srv` on member `rac2`
Stop of `ora.rac1.gsd` on member `rac1` succeeded.
Stop of `ora.devdb.SLBA.devdb2.srv` on member `rac2` succeeded.
Stop of `ora.devdb.SLBA.devdb1.srv` on member `rac1` succeeded.
Stop of `ora.devdb.SNOLBA.devdb2.srv` on member `rac2` succeeded.
Stop of `ora.devdb.SNOLBA.devdb1.srv` on member `rac1` succeeded.
Attempting to stop `ora.rac1.ons` on member `rac1`
Attempting to stop `ora.rac2.gsd` on member `rac2`
Attempting to stop `ora.rac2.ons` on member `rac2`
Stop of `ora.rac2.gsd` on member `rac2` succeeded.
Attempting to stop `ora.devdb.db` on member `rac2`
Stop of `ora.rac1.ons` on member `rac1` succeeded.
Stop of `ora.rac2.ons` on member `rac2` succeeded.
Stop of `ora.devdb.SLBA.cs` on member `rac1` succeeded.
Stop of `ora.devdb.SNOLBA.cs` on member `rac1` succeeded.
Stop of `ora.devdb.db` on member `rac2` succeeded.
Attempting to stop `ora.rac1.LISTENER_RAC1.lsnr` on member `rac1`
Attempting to stop `ora.rac2.LISTENER_RAC2.lsnr` on member `rac2`
Stop of `ora.rac1.LISTENER_RAC1.lsnr` on member `rac1` succeeded.
`ora.devdb.devdb1.inst` is already OFFLINE.
Attempting to stop `ora.rac1.ASM1.asm` on member `rac1`
Stop of `ora.rac2.LISTENER_RAC2.lsnr` on member `rac2` succeeded.
`ora.devdb.devdb2.inst` is already OFFLINE.
Attempting to stop `ora.rac2.ASM2.asm` on member `rac2`
Stop of `ora.rac1.ASM1.asm` on member `rac1` succeeded.
Attempting to stop `ora.rac1.vip` on member `rac1`
Stop of `ora.rac2.ASM2.asm` on member `rac2` succeeded.
Stop of `ora.rac1.vip` on member `rac1` succeeded.
Attempting to stop `ora.rac2.vip` on member `rac2`
Stop of `ora.rac2.vip` on member `rac2` succeeded.
CRS-0216: Could not stop resource 'ora.devdb.devdb1.inst'.

CRS-0216: Could not stop resource 'ora.devdb.devdb2.inst'.

[oracle@rac1 ~]$ ./crs_rep.sh
HA Resource                                   Target     State
-----------                                   ------     -----
ora.devdb.SLBA.cs                             OFFLINE    OFFLINE
ora.devdb.SLBA.devdb1.srv                     OFFLINE    OFFLINE
ora.devdb.SLBA.devdb2.srv                     OFFLINE    OFFLINE
ora.devdb.SNOLBA.cs                           OFFLINE    OFFLINE
ora.devdb.SNOLBA.devdb1.srv                   OFFLINE    OFFLINE
ora.devdb.SNOLBA.devdb2.srv                   OFFLINE    OFFLINE
ora.devdb.db                                  OFFLINE    OFFLINE
ora.devdb.devdb1.inst                         OFFLINE    OFFLINE
ora.devdb.devdb2.inst                         OFFLINE    OFFLINE
ora.rac1.ASM1.asm                             OFFLINE    OFFLINE
ora.rac1.LISTENER_RAC1.lsnr                   OFFLINE    OFFLINE
ora.rac1.gsd                                  OFFLINE    OFFLINE
ora.rac1.ons                                  OFFLINE    OFFLINE
ora.rac1.vip                                  OFFLINE    OFFLINE
ora.rac2.ASM2.asm                             OFFLINE    OFFLINE
ora.rac2.LISTENER_RAC2.lsnr                   OFFLINE    OFFLINE
ora.rac2.gsd                                  OFFLINE    OFFLINE
ora.rac2.ons                                  OFFLINE    OFFLINE
ora.rac2.vip                                  OFFLINE    OFFLINE



5) Stop CRS on both node

The following command has failed due to OCR was deleted:

[oracle@rac1 ~]$ sudo $ORA_CRS_HOME/bin/crsctl stop crs
Password:
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]

Note: normally we should see:

[oracle@rac1 ~]$ ssh rac2 sudo $ORA_CRS_HOME/bin/crsctl stop crs
Stopping resources.
Successfully stopped CRS resources
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.


Tried to kill ocssd.bin processes on both nodes at OS level. Server reboot automatically after they are killed.
After reboot, no CRS procesess are running.

[oracle@rac1 ~]$ ps -ef | grep d.bin
oracle    8887  8391  0 15:36 pts/1    00:00:00 grep d.bin
[oracle@rac1 ~]$ ssh rac2 ps -ef | grep d.bin
[oracle@rac1 ~]$



6) Restore OCR using the backup identified at step 3

Due to my OCR resides on OCFS2 filesystem, I need to create the file at first.

root@rac1:/ocfs/clusterware [devdb1]# dd if=/dev/zero of=/ocfs/clusterware/ocr bs=1M count=128
128+0 records in
128+0 records out
root@rac1:/ocfs/clusterware [devdb1]# ls -lhtr
total 138M
-rw-r--r--  1 oracle oinstall 9.8M Apr  5 10:17 votingdisk
-rw-r--r--  1 root   root     128M Apr 21 15:41 ocr


Then I can do a restore from backup:


[oracle@rac1 clusterware]$ ssh rac2 sudo $ORA_CRS_HOME/bin/ocrconfig -restore /u01/app/oracle/product/10.2.0/crs_1/cdata/crs/backup00.ocr
Password:vz123ys




[oracle@rac1 ~]$ cluvfy comp ocr -n all -verbose

Verifying OCR integrity

Checking OCR integrity...

Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations.

Uniqueness check for OCR device passed.

Checking the version of OCR...
OCR of correct Version "2" exists.

Checking data integrity of OCR...
Data integrity check for OCR passed.

OCR integrity check passed.

Verification of OCR integrity was successful.



Note: the syntax for restore from logical backup is: ocrconfig -import /home/oracle/backup/ocrlogicbackup

7) Restart CRS on both nodes

[oracle@rac1 ~]$ sudo $ORA_CRS_HOME/bin/crsctl start crs
Password:
Attempting to start CRS stack
The CRS stack will be started shortly
[oracle@rac1 ~]$ ssh rac2 sudo $ORA_CRS_HOME/bin/crsctl start crs
Password:vz123ys

Attempting to start CRS stack
The CRS stack will be started shortly

[oracle@rac1 ~]$ ps -ef | grep d.bin
root      6591     1  0 15:29 ?        00:00:04 /u01/app/oracle/product/10.2.0/crs_1/bin/crsd.bin reboot
oracle    9941  6589  0 15:45 ?        00:00:01 /u01/app/oracle/product/10.2.0/crs_1/bin/evmd.bin
oracle   10071 10045  0 15:45 ?        00:00:01 /u01/app/oracle/product/10.2.0/crs_1/bin/ocssd.bin
oracle   20444  8391  0 15:51 pts/1    00:00:00 grep d.bin



After serveral trials using SRVCTL to shutdown/bring up individual commponent and manually shutdown ASM and RAC2 instances, finally I got:

[oracle@rac1 ~]$ ./crs_rep.sh
HA Resource                                   Target     State
-----------                                   ------     -----
ora.devdb.SLBA.cs                             ONLINE     ONLINE on rac1
ora.devdb.SLBA.devdb1.srv                     ONLINE     ONLINE on rac1
ora.devdb.SLBA.devdb2.srv                     ONLINE     ONLINE on rac2
ora.devdb.SNOLBA.cs                           ONLINE     ONLINE on rac1
ora.devdb.SNOLBA.devdb1.srv                   ONLINE     ONLINE on rac1
ora.devdb.SNOLBA.devdb2.srv                   ONLINE     ONLINE on rac2
ora.devdb.db                                  ONLINE     ONLINE on rac2
ora.devdb.devdb1.inst                         ONLINE     ONLINE on rac1
ora.devdb.devdb2.inst                         ONLINE     ONLINE on rac2
ora.rac1.ASM1.asm                             ONLINE     ONLINE on rac1
ora.rac1.LISTENER_RAC1.lsnr                   ONLINE     ONLINE on rac1
ora.rac1.gsd                                  ONLINE     ONLINE on rac1
ora.rac1.ons                                  ONLINE     ONLINE on rac1
ora.rac1.vip                                  ONLINE     ONLINE on rac1
ora.rac2.ASM2.asm                             ONLINE     ONLINE on rac2
ora.rac2.LISTENER_RAC2.lsnr                   ONLINE     ONLINE on rac2
ora.rac2.gsd                                  ONLINE     ONLINE on rac2
ora.rac2.ons                                  ONLINE     ONLINE on rac2
ora.rac2.vip                                  ONLINE     ONLINE on rac2
[oracle@rac1 ~]$



Ref:
(1) http://download.oracle.com/docs/cd/B19306_01/rac.102/b14197/votocr.htm#BABIHADG
(2) OCR / Vote disk Maintenance Operations: (ADD/REMOVE/REPLACE/MOVE), including moving from RAW Devices to Block Devices. [MOS ID 428681.1]
(3) 'ocrconfig -replace ocr' Fails With PROT-16 [MOS ID 444757.1]

No comments: