Monday, July 14, 2008

RAC: misconfigured or faulty interconnect

This note is to document the symptoms resulted from misconfigured or faulty Interconnect. The example given is based on an Oracle Openworld presentation "RAC Performances Experts Reveal All", which I found on web.

For RAC database one of the common problems is misconfigured or faulty Interconnect. This casued "Lost Blocks" problem, whose symptom can be identified by following means:

(1) ifconfig indicating NIC receive errors

ifconfig -a:

eth0 Link encap:Ethernet HWaddr 00:0B:DB:4B:A2:04
inet addr:130.35.25.110 Bcast:130.35.27.255 Mask:255.255.252.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:21721236 errors:135 dropped:0 overruns:0 frame:95
TX packets:273120 errors:0 dropped:0 overruns:0 carrier:0


Notice the errors:135

Note: ifconfig command help: http://www.computerhope.com/unix/uifconfi.htm

(2) netstat showing ip packet ressembly failures

netstat -s

Ip:

84884742 total packets received
1201 fragments dropped after timeout
3384 packet reassembles failed


(3) AWR report : "gc cr block lost" should never be there



Top 5 Timed Events
~~~~~~~~~~~~~~~~~~

%Total
Avg wait Call
Event Waits Time(s) (ms) Time Wait Class
-----------------------------------------------------------------------
log file sync 286,038 49,872 174 41.7 Commit
gc buffer busy 177,315 29,021 164 24.3 Cluster
gc cr block busy 110,348 5,703 52 4.8 Cluster
gc cr block lost 4,272 4,953 1159 4.1 Cluster
cr request retry 6,316 4,668 739 3.9 Other

 

Below is from a Metalink Doc about check interconnect:

Q: How do I check for network problems on my interconect?
A:
1. Confirm that full duplex is set correctly for all interconnect links on all interfaces on both ends. Do not rely on auto negotiation.

2. ifconfig -a will give you an indication of collisions/errors/overuns and dropped packets

3. netstat -s will give you a listing of receive packet discards, fragmentation and reassembly errors for IP and UDP.

4. Set the udp buffers correctly

5. Check your cabling

Note: If you are seeing issues with RAC, RAC uses UDP as the protocol. Oracle Clusterware uses TCP/IP.

No comments: