Tuesday, May 04, 2010

Play with my toy 10g RAC VI - Failed to Start CRS Processes due to OCR Out-of-Sync

Today is the first time I am trying to play with my toy RAC again since I have moved OCR and voting disk files to the raw devices last time as described here. To my surprise, I can not start CRS processes (i.e. those seen from ps -ef | grep d.bin) ! First of all, they should be started automatically after server reboots, but they did not. Secondly they did not start after I issued "crsctl start crs" or "/etc/init.d/init.crs start" either. It took me quite a while to find the right log that indicates the problem:

[oracle@rac1 client]$ pwd
/u01/app/oracle/product/10.2.0/crs_1/log/rac1/client
[oracle@rac1 client]$
[oracle@rac1 client]$ tail -10 css.log
2010-04-05 10:15:01.231: [ CSSCLNT][3086931648]clsssInitNative: connect failed, rc 9

2010-04-05 10:15:02.249: [ CSSCLNT][3086931648]clsssInitNative: connect failed, rc 9

2010-04-21 13:44:32.807: [ default][3068479168]prlsndmain: olsnodes successful!!
2010-04-28 11:13:01.999: [  OCRRAW][3086931648]propriogid:1: INVALID FORMAT
2010-04-28 12:02:36.960: [  OCRRAW][3086931648]propriogid:1: INVALID FORMAT
2010-05-04 10:58:31.850: [  OCRRAW][3068479168]propriogid:1: INVALID FORMAT
2010-05-04 10:58:31.932: [ default][3068479168]prlsndmain: olsnodes successful!!
2010-05-04 11:06:06.374: [  OCRRAW][3086931648]propriogid:1: INVALID FORMAT


It showed that OCR had problems. I then noticed "Device/File needs to be synchronized with the other device" in the ocrcheck output:

root@rac1:/u01/app/oracle/product/10.2.0/crs_1/log/rac1/client [devdb1]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     505928
         Used space (kbytes)      :       5372
         Available space (kbytes) :     500556
         ID                       :  645781380
         Device/File Name         : /dev/raw/raw4
                                    Device/File needs to be synchronized with the other device
         Device/File Name         : /dev/raw/raw5
                                    Device/File integrity check succeeded

         Cluster registry integrity check succeeded


I executed the following command:

# dd if=/dev/raw/raw5 of=/dev/raw/raw4

Then, I found ocrcheck was good again.
[oracle@rac1 ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     505928
         Used space (kbytes)      :       5348
         Available space (kbytes) :     500580
         ID                       :  219149972
         Device/File Name         : /dev/raw/raw4
                                    Device/File integrity check succeeded
         Device/File Name         : /dev/raw/raw5
                                    Device/File integrity check succeeded

         Cluster registry integrity check succeeded

I am not sure it is the right or best way to fix this problem, but it worked. BTW, I tried "ocrconfig -replace ocr /dev/raw/raw5", not working. After that, I was able to bring everything back to normal. Actually, the problem was there when I created ocr and ocrmirror on the raw devices, but I did not notice it at that time.

No comments: