Friday, May 07, 2010

Play with my toy 10g RAC VII - Voting Disk

The voting disk is a shared raw disk partition or file on a clustered file system that is accessible to all nodes in the cluster. Its primary purpose is to help in situations where the private network communication fails. When that happens, the cluster is unable to have all nodes remain available because they are no longer able to synchronize I/O to the shared disks. Therefore, some of the nodes must go offline. The voting disk is then used to communicate the node state information used to determine which nodes go offline.


If we have 3 nodes,  presumably to the voting disk each node will write message as follows:


Node 1 writes : I can see Node 2 & 3
Node 2 writes : I can see Node 1 & 3
Node 3 writes : I can see Node 1 & 2


If for example Node 3's private network has problem, the message may become:


Node 1 writes : I can see Node 2 only
Node 2 writes : I can see Node 1 only
Node 3 writes : I can not see either Node 1 or Node 2 ( or it does not write anything)


In this situation, clearly Node 3 should be evicted from the cluster.




To avoid a single point of failure, we can multiplex voting disk. By design, if strictly more than half of the voting disks are up and contain consistent information, the cluster will be fine. That is to say if we have 5 voting disks, we can have at most 2 voting disk failures.
So the number_of_voting_disk = number_of_tolerable_disk_failure * 2 + 1.


This post is to document my test with the following task regarding voting disk administration:


Task - Recover from the lost of voting disks




1. Check the current voting disk configuration


[oracle@rac1 backup]$ crsctl query css votedisk
 0.     0    /dev/raw/raw6
 1.     0    /dev/raw/raw7
 2.     0    /dev/raw/raw8

located 3 votedisk(s).




2. Backup voting disk




[oracle@rac1 backup]$ dd if=/dev/raw/raw6 of=/home/oracle/backup/votingdisk_050710
80325+0 records in
80325+0 records out
[oracle@rac1 backup]$
[oracle@rac1 backup]$ ls -lhtr
total 40M
-rw-r--r--  1 oracle oinstall 40M May  7 16:23 votingdisk_050710



3. Wipe out the first voting disk


dd if=/dev/zero of=/dev/raw/raw6


Note: I have three voting disk files, in my understanding, the cluster should survive with 1 voting disk failure, however,rac1 and rac2 reboot right after I issue this command. I don't know why.


--------------- RAC 1 alert log  --------------------
 
Fri May  7 16:25:21 2010
Trace dumping is performing id=[cdmp_20100507162519]
Fri May  7 16:25:23 2010
Error: KGXGN aborts the instance (6)
Fri May  7 16:25:24 2010
Errors in file /u01/app/oracle/admin/devdb/bdump/devdb1_lmon_10476.trc:
ORA-29702: error occurred in Cluster Group Service operation
LMON: terminating instance due to error 29702


--------------- RAC 2 alert log --------------------
ri May  7 16:25:19 2010
Error: KGXGN aborts the instance (6)
Fri May  7 16:25:19 2010
Error: unexpected error (6) from the Cluster Service (LCK0)
Fri May  7 16:25:19 2010
Errors in file /u01/app/oracle/admin/devdb/bdump/devdb2_lmon_3150.trc:
ORA-29702: error occurred in Cluster Group Service operation
Fri May  7 16:25:19 2010
Errors in file /u01/app/oracle/admin/devdb/bdump/devdb2_lck0_3236.trc:
ORA-29702: error occurred in Cluster Group Service operation
Fri May  7 16:25:19 2010
LMON: terminating instance due to error 29702
Fri May  7 16:25:21 2010
System state dump is made for local instance
System State dumped to trace file /u01/app/oracle/admin/devdb/bdump/devdb2_diag_3146.trc
Fri May  7 16:31:01 2010





4. Restart CRS stack


[oracle@rac1 ~]$ sudo $ORA_CRS_HOME/bin/crsctl stop crs
Password:
Stopping resources.

Successfully stopped CRS resources
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
[oracle@rac1 ~]$
[oracle@rac1 ~]$ ssh rac2 sudo $ORA_CRS_HOME/bin/crsctl stop crs
Password:vz123ys

Stopping resources.
Successfully stopped CRS resources
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
[oracle@rac1 ~]$
[oracle@rac1 ~]$
[oracle@rac1 ~]$
[oracle@rac1 ~]$ ps -ef | grep d.bin
oracle   14672 30539  0 16:56 pts/1    00:00:00 grep d.bin
[oracle@rac1 ~]$ ssh rac2 ps -ef | grep d.bin
[oracle@rac1 ~]$ ./crsstat.sh
HA Resource                                   Target     State
-----------                                   ------     -----
error connecting to CRSD at [(ADDRESS=(PROTOCOL=ipc)(KEY=ora_crsqs))] clsccon 184

[oracle@rac1 ~]$ sudo $ORA_CRS_HOME/bin/crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly
[oracle@rac1 ~]$ ssh rac2 sudo $ORA_CRS_HOME/bin/crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly


[oracle@rac1 ~]$ ps -ef | grep d.bin
root     14242     1  0 16:54 ?        00:00:00 /u01/app/oracle/product/10.2.0/crs_1/bin/crsd.bin reboot
oracle   15219 14240  2 16:58 ?        00:00:00 /u01/app/oracle/product/10.2.0/crs_1/bin/evmd.bin
oracle   15383 15357  2 16:58 ?        00:00:00 /u01/app/oracle/product/10.2.0/crs_1/bin/ocssd.bin
oracle   15602 30539  0 16:58 pts/1    00:00:00 grep d.bin
[oracle@rac1 ~]$ ssh rac2 ps -ef | grep d.bin
root     23610     1  0 16:56 ?        00:00:00 /u01/app/oracle/product/10.2.0/crs_1/bin/crsd.bin reboot
oracle   24394 23609  2 16:58 ?        00:00:00 /u01/app/oracle/product/10.2.0/crs_1/bin/evmd.bin
oracle   24575 24549  2 16:58 ?        00:00:00 /u01/app/oracle/product/10.2.0/crs_1/bin/ocssd.bin
[oracle@rac1 ~]$ ./crsstat.sh
HA Resource                                   Target     State
-----------                                   ------     -----
ora.devdb.SLBA.cs                             OFFLINE    OFFLINE
ora.devdb.SLBA.devdb1.srv                     OFFLINE    OFFLINE
ora.devdb.SLBA.devdb2.srv                     OFFLINE    OFFLINE
ora.devdb.SNOLBA.cs                           OFFLINE    OFFLINE
ora.devdb.SNOLBA.devdb1.srv                   OFFLINE    OFFLINE
ora.devdb.SNOLBA.devdb2.srv                   OFFLINE    OFFLINE
ora.devdb.db                                  ONLINE     ONLINE on rac2
ora.devdb.devdb1.inst                         ONLINE     ONLINE on rac1
ora.devdb.devdb2.inst                         ONLINE     ONLINE on rac2
ora.rac1.ASM1.asm                             ONLINE     ONLINE on rac1
ora.rac1.LISTENER_RAC1.lsnr                   ONLINE     ONLINE on rac1
ora.rac1.gsd                                  ONLINE     ONLINE on rac1
ora.rac1.ons                                  ONLINE     ONLINE on rac1
ora.rac1.vip                                  ONLINE     ONLINE on rac1
ora.rac2.ASM2.asm                             ONLINE     ONLINE on rac2
ora.rac2.LISTENER_RAC2.lsnr                   ONLINE     ONLINE on rac2
ora.rac2.gsd                                  ONLINE     ONLINE on rac2
ora.rac2.ons                                  ONLINE     ONLINE on rac2
ora.rac2.vip                                  ONLINE     ONLINE on rac2






5. Check log


[oracle@rac1 rac1]$ tail alertrac1.log
2010-05-07 16:58:50.358
[cssd(15383)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 rac2 .
2010-05-07 16:58:54.722
[crsd(14242)]CRS-1201:CRSD started on node rac1.
2010-05-07 16:59:45.626
[cssd(15383)]CRS-1604:CSSD voting file is offline: /dev/raw/raw6. Details in /u01/app/oracle/product/10.2.0/crs_1 /log/rac1/cssd/ocssd.log.
2010-05-07 17:00:47.657
[cssd(15383)]CRS-1604:CSSD voting file is offline: /dev/raw/raw6. Details in /u01/app/oracle/product/10.2.0/crs_1 /log/rac1/cssd/ocssd.log.
2010-05-07 17:01:49.730
[cssd(15383)]CRS-1604:CSSD voting file is offline: /dev/raw/raw6. Details in /u01/app/oracle/product/10.2.0/crs_1 /log/rac1/cssd/ocssd.log.
[oracle@rac1 rac1]$ tail /u01/app/oracle/product/10.2.0/crs_1/log/rac1/cssd/ocssd.log
[    CSSD]2010-05-07 17:00:16.337 [132250528] >TRACE:   clssgmClientConnectMsg: Connect from con(0x8358600) proc( 0x8389ac8) pid() proto(10:2:1:1)
[    CSSD]2010-05-07 17:00:22.183 [132250528] >TRACE:   clssgmClientConnectMsg: Connect from con(0x8358600) proc( 0x838eb90) pid() proto(10:2:1:1)
[    CSSD]2010-05-07 17:00:30.776 [132250528] >TRACE:   clssgmClientConnectMsg: Connect from con(0x8358600) proc( 0x838ee90) pid() proto(10:2:1:1)
[    CSSD]2010-05-07 17:00:47.657 [62401440] >TRACE:   clssnmDiskStateChange: state from 3 to 3 disk (0//dev/raw/ raw6)
[    CSSD]2010-05-07 17:01:07.263 [132250528] >TRACE:   clssgmClientConnectMsg: Connect from con(0x8358600) proc( 0x837e6b8) pid() proto(10:2:1:1)
[    CSSD]2010-05-07 17:01:09.009 [132250528] >TRACE:   clssgmClientConnectMsg: Connect from con(0x8357dd8) proc( 0x8379340) pid() proto(10:2:1:1)
[    CSSD]2010-05-07 17:01:49.730 [62401440] >TRACE:   clssnmDiskStateChange: state from 3 to 3 disk (0//dev/raw/ raw6)
[    CSSD]2010-05-07 17:02:09.984 [132250528] >TRACE:   clssgmClientConnectMsg: Connect from con(0x835a580) proc( 0x8365a50) pid() proto(10:2:1:1)
[    CSSD]2010-05-07 17:02:51.784 [62401440] >TRACE:   clssnmDiskStateChange: state from 3 to 3 disk (0//dev/raw/ raw6)
[    CSSD]2010-05-07 17:03:12.292 [132250528] >TRACE:   clssgmClientConnectMsg: Connect from con(0x835a580) proc( 0x8365a50) pid() proto(10:2:1:1)



Note: similar message from rac2 alertrac2.log and ocssd.log. It can be seen with three voting disk files, if one of them unavailable, the RAC is still functioning.




6. Wipe out the second voting disk


dd if=/dev/zero of=/dev/raw/raw7


Two nodes reboot right after issuing above command. After reboot, only see evmd running:


[oracle@rac1 ~]$ ps -ef | grep d.bin
oracle    8139  6985  3 17:12 ?        00:00:14 /u01/app/oracle/product/10.2.0/crs_1/bin/evmd.binoracle   11075 10255  0 17:20 pts/1    00:00:00 grep d.bin




7. Check log:


--- alertrac1.log   show two voting disk files are offline

2010-05-07 17:12:14.926
[cssd(8303)]CRS-1604:CSSD voting file is offline: /dev/raw/raw6. Details in /u01/app/oracle/product/10.2.0/crs_1/log/rac1/cssd/ocssd.log.
2010-05-07 17:12:15.099
[cssd(8303)]CRS-1604:CSSD voting file is offline: /dev/raw/raw7. Details in /u01/app/oracle/product/10.2.0/crs_1/log/rac1/cssd/ocssd.log.
2010-05-07 17:12:15.147
[cssd(8303)]CRS-1605:CSSD voting file is online: /dev/raw/raw8. Details in /u01/app/oracle/product/10.2.0/crs_1/log/rac1/cssd/ocssd.log.
[oracle@rac1 rac1]$

[oracle@rac1 crsd]$ tail crsd.log

2010-05-07 17:14:04.532: [ COMMCRS][36494240]clsc_connect: (0x8655528) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac1_crs))

2010-05-07 17:14:04.532: [ CSSCLNT][3086931648]clsssInitNative: connect failed, rc 9

2010-05-07 17:14:04.533: [  CRSRTI][3086931648]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..

2010-05-07 17:14:05.536: [ CRSMAIN][3086931648][PANIC]0CRSD exiting: Could not init the CSS context

2010-05-07 17:14:05.540: [ default][3086931648]Terminating clsd session







8. Restore Voting Disk


[oracle@rac1 ~]$ dd if=/home/oracle/backup/votingdisk_050710 of=/dev/raw/raw6
80325+0 records in
80325+0 records out
[oracle@rac1 ~]$ dd if=/home/oracle/backup/votingdisk_050710 of=/dev/raw/raw7
dd: writing to `/dev/raw/raw7': No space left on device
80263+0 records in
80262+0 records out





9. Restart CRS


[oracle@rac1 ~]$ sudo $ORA_CRS_HOME/bin/crsctl start crs
Password:
Attempting to start CRS stack
The CRS stack will be started shortly
[oracle@rac1 ~]$ ssh rac2 sudo $ORA_CRS_HOME/bin/crsctl start crs
Password:vz123ys

Attempting to start CRS stack
The CRS stack will be started shortly


---- in alertrac1.log --------------------------

[oracle@rac1 rac1]$ pwd
/u01/app/oracle/product/10.2.0/crs_1/log/rac1
[oracle@rac1 rac1]$ tail -15 alertrac1.log
[cssd(8303)]CRS-1605:CSSD voting file is online: /dev/raw/raw8. Details in /u01/app/oracle/product/10.2.0/crs_1/log/rac1/cssd/ocssd.log.
2010-05-07 17:29:31.679
[cssd(13301)]CRS-1605:CSSD voting file is online: /dev/raw/raw6. Details in /u01/app/oracle/product/10.2.0/crs_1/log/rac1/cssd/ocssd.log.
2010-05-07 17:29:31.714
[cssd(13301)]CRS-1605:CSSD voting file is online: /dev/raw/raw7. Details in /u01/app/oracle/product/10.2.0/crs_1/log/rac1/cssd/ocssd.log.
2010-05-07 17:29:31.729
[cssd(13301)]CRS-1605:CSSD voting file is online: /dev/raw/raw8. Details in /u01/app/oracle/product/10.2.0/crs_1/log/rac1/cssd/ocssd.log.
2010-05-07 17:29:35.433
[cssd(13301)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 rac2 .
2010-05-07 17:29:38.247
[crsd(8910)]CRS-1012:The OCR service started on node rac1.
2010-05-07 17:29:38.287
[evmd(13364)]CRS-1401:EVMD started on node rac1.
2010-05-07 17:31:02.432
[crsd(8910)]CRS-1201:CRSD started on node rac1.

---- CTS resource are on-line

[oracle@rac1 ~]$ ~/crsstat.sh
HA Resource                                   Target     State
-----------                                   ------     -----
ora.devdb.SLBA.cs                             OFFLINE    OFFLINE
ora.devdb.SLBA.devdb1.srv                     OFFLINE    OFFLINE
ora.devdb.SLBA.devdb2.srv                     OFFLINE    OFFLINE
ora.devdb.SNOLBA.cs                           OFFLINE    OFFLINE
ora.devdb.SNOLBA.devdb1.srv                   OFFLINE    OFFLINE
ora.devdb.SNOLBA.devdb2.srv                   OFFLINE    OFFLINE
ora.devdb.db                                  ONLINE     ONLINE on rac2
ora.devdb.devdb1.inst                         ONLINE     ONLINE on rac1
ora.devdb.devdb2.inst                         ONLINE     ONLINE on rac2
ora.rac1.ASM1.asm                             ONLINE     ONLINE on rac1
ora.rac1.LISTENER_RAC1.lsnr                   ONLINE     ONLINE on rac1
ora.rac1.gsd                                  ONLINE     ONLINE on rac1
ora.rac1.ons                                  ONLINE     ONLINE on rac1
ora.rac1.vip                                  ONLINE     ONLINE on rac1
ora.rac2.ASM2.asm                             ONLINE     ONLINE on rac2
ora.rac2.LISTENER_RAC2.lsnr                   ONLINE     ONLINE on rac2
ora.rac2.gsd                                  ONLINE     ONLINE on rac2
ora.rac2.ons                                  ONLINE     ONLINE on rac2
ora.rac2.vip                                  ONLINE     ONLINE on rac2
[oracle@rac1 ~]$ date
Fri May  7 17:37:35 EDT 2010
 




In conclusion, with 3 voting disks, my test did show RAC can be operational if one of them is offline; if two of them are not available, then the CRS daemon can not start at all. However, when zero-out one of the voting disks, the server reboots, this is not desirable, not sure if this is due to my particular environment.

No comments: