Wednesday, October 20, 2010

Composite index selectivity

I did tests with 10053 trace today and found that Oracle CBO behavior changes regarding compute selectivity of 3-column composite index from 9.2.0.8 to 10.2.0.4. (note: no histogram on any columns)

In 9i: selectivty of index(col1, col2, col3)= density (co11) x denisty (col2) x denisity(col3);

In 10g and 11g selectivity of index(col1, col2, col3) = 1/DISTINCT_KEYS where DISTINCT_KEYS is from index stats obtainable through dba_indexes.


Here is the test case which mimics the production able and the problem sql.

----- begin of the test case ----

drop table test;
create table test
as
select level id,
       trunc(mod(level,80)) id1,
       20*trunc(mod(level, 40)) id2,
       1    id3,
       trunc(mod(level,50)) id4
from dual
connect by level <=1000000;


create index test_ix1 on test(id1, id2, id3);

exec dbms_stats.gather_table_stats(user,'test', method_opt=>'FOR ALL COLUMNS size 1', cascade=>true);


alter session set tracefile_identifier = test;
ALTER SESSION SET EVENTS='10053 trace name context forever, level 1';

explain plan for
select id
from test
where id1 = 20
and id3=1
and id2 = 400;

ALTER SESSION SET EVENTS '10053 trace name context off';
Exit
------ end of test case  -----------------------

#### 9.2.0.8  10053 trace excerpt


***************************************
SINGLE TABLE ACCESS PATH
Column:        ID1  Col#: 2      Table: TEST   Alias: TEST
    NDV: 80        NULLS: 0         DENS: 1.2500e-02 LO:  0  HI: 79
    NO HISTOGRAM: #BKT: 1 #VAL: 2
Column:        ID3  Col#: 4      Table: TEST   Alias: TEST
    NDV: 1         NULLS: 0         DENS: 1.0000e+00 LO:  1  HI: 1
    NO HISTOGRAM: #BKT: 1 #VAL: 2
Column:        ID2  Col#: 3      Table: TEST   Alias: TEST
    NDV: 40        NULLS: 0         DENS: 2.5000e-02 LO:  0  HI: 780
    NO HISTOGRAM: #BKT: 1 #VAL: 2
  TABLE: TEST     ORIG CDN: 1000000  ROUNDED CDN: 313  CMPTD CDN: 313
  Access path: tsc  Resc:  857  Resp:  847
  Access path: index (equal)
      Index: TEST_IX1
  TABLE: TEST
      RSC_CPU: 711380   RSC_IO: 82
  IX_SEL:  0.0000e+00  TB_SEL:  3.1250e-04
  BEST_CST: 83.00  PATH: 4  Degree:  1


#### 10.2.0.4  10053 trace excerpt

***************************************
SINGLE TABLE ACCESS PATH
  -----------------------------------------
  BEGIN Single Table Cardinality Estimation
  -----------------------------------------
  Column (#2): ID1(NUMBER)
    AvgLen: 3.00 NDV: 81 Nulls: 0 Density: 0.012346 Min: 0 Max: 79
  Column (#4): ID3(NUMBER)
    AvgLen: 3.00 NDV: 1 Nulls: 0 Density: 1 Min: 1 Max: 1
  Column (#3): ID2(NUMBER)
    AvgLen: 4.00 NDV: 40 Nulls: 0 Density: 0.025 Min: 0 Max: 780
  Table: TEST  Alias: TEST
    Card: Original: 1003092  Rounded: 310  Computed: 309.60  Non Adjusted: 309.60
  -----------------------------------------
  END   Single Table Cardinality Estimation
  -----------------------------------------
  Access Path: TableScan
    Cost:  730.88  Resp: 730.88  Degree: 0
      Cost_io: 700.00  Cost_cpu: 284170229
      Resp_io: 700.00  Resp_cpu: 284170229
  Access Path: index (AllEqRange)
    Index: TEST_IX1
    resc_io: 3173.00  resc_cpu: 27721329
    ix_sel: 0.0125  ix_sel_with_filters: 0.0125
    Cost: 3176.01  Resp: 3176.01  Degree: 1
  Best:: AccessPath: TableScan
         Cost: 730.88  Degree: 1  Resp: 730.88  Card: 309.60  Bytes: 0

Note:
select index_name, distinct_keys from user_indexes where index_name='TEST_IX1';

INDEX_NAME                     DISTINCT_KEYS
------------------------------ -------------
TEST_IX1                                  80


#### 11.2.0.1  10053 trace excerpt


***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for TEST[TEST]
  ColGroup (#1, Index) TEST_IX1
    Col#: 2 3 4    CorStregth: 40.00
  ColGroup Usage:: PredCnt: 3  Matches Full: #1  Partial:  Sel: 0.0125
  Table: TEST  Alias: TEST
    Card: Original: 1000000.000000  Rounded: 12500  Computed: 12500.00  Non Adjusted: 12500.00
  Access Path: TableScan
    Cost:  629.80  Resp: 629.80  Degree: 0
      Cost_io: 617.00  Cost_cpu: 283372261
      Resp_io: 617.00  Resp_cpu: 283372261
  ColGroup Usage:: PredCnt: 3  Matches Full: #1  Partial:  Sel: 0.0125
  ColGroup Usage:: PredCnt: 3  Matches Full: #1  Partial:  Sel: 0.0125
  Access Path: index (AllEqRange)
    Index: TEST_IX1
    resc_io: 3173.00  resc_cpu: 27721329
    ix_sel: 0.012500  ix_sel_with_filters: 0.012500
    Cost: 3174.25  Resp: 3174.25  Degree: 1
  Best:: AccessPath: TableScan
         Cost: 629.80  Degree: 1  Resp: 629.80  Card: 12500.00  Bytes: 0


Notice the ix_sel line in the above 10053 trace excerpt; for 10g and 11g it equal to 1/80. for 9i seems computed through multiplying selectivity of eeach column.

Update:
Came accross Jonathan Lewis's comment, and came to know this CBO behavior changes acctually from 10.1.0.4 ie. 10.1.0.4 is still similar to 9i behavior.


Below is his comment: (http://kr.forums.oracle.com/forums/thread.jspa?messageID=3153224)

--------
For a single-table query with the predicate
(full list of index columns) = (set of values)

The selectivities vary across version as follows:

11.1.0.6: Table selectivity and index selectivity given by distinct_keys in index
10.2.0.1: Table selectivity given by product of column selectivities, index selectivity given by distinct_keys in index
10.1.0.4: Table selectivity and index selectivity given by product of column selectivities

------------

JL's original post about this behavior: http://jonathanlewis.wordpress.com/2008/03/11/everything-changes/

Wednesday, September 15, 2010

Privileges to enable a non-DBA user to generate AWR report

The following privileges seem working for the stated purpose:

GRANT SELECT ON SYS.V_$DATABASE TO xxx
/
GRANT SELECT ON SYS.V_$INSTANCE TO xxx
/
GRANT EXECUTE ON SYS.DBMS_WORKLOAD_REPOSITORY TO xxx
/
GRANT SELECT ON SYS.DBA_HIST_DATABASE_INSTANCE TO xxx
/
GRANT SELECT ON SYS.DBA_HIST_SNAPSHOT TO xxx
/
GRANT ADVISOR TO xxx
/

Monday, August 16, 2010

CURSOR_SHARING and PL/SQL procedure

With cursor_sharing=FORCE, in most cases, Oracle will replace literal values with bind variables, however,this is not the case for PL/SQL blocks. See the following test case. When using CALL proc_name(), it can. This behavior is mentioned in Jonathen Lewis's book "Cost-based Oracle Fundamentals" p159.


SQL>alter session set cursor_sharing=force;

Session altered.

SQL>drop table t;

Table dropped.

SQL>create table t(a number);

Table created.

SQL>
SQL>create or replace procedure denis_proc(a in number)
  2   is
  3  begin
  4    insert into t values(a);
  5   commit;
  6  end;
  7  /

Procedure created.

SQL>
SQL>exec denis_proc(1);

PL/SQL procedure successfully completed.

SQL>exec denis_proc(2);

PL/SQL procedure successfully completed.

SQL>exec denis_proc(3);

PL/SQL procedure successfully completed.

SQL>call denis_proc(4);

Call completed.

SQL>call denis_proc(5);

Call completed.

SQL>call denis_proc(6);

Call completed.

SQL>
SQL>col sql_text format a30
SQL>
SQL>select sql_id, executions, sql_text from v$sqlarea where sql_text like '%denis_proc%';

SQL_ID1G        EXECUTIONS SQL_TEXT
------------- ---------- ------------------------------
1ms5rkrgwt28v          1 BEGIN denis_proc(1); END;
0vcasv51bxgpv          1 BEGIN denis_proc(3); END;
dcna79fxryrkx          1 BEGIN denis_proc(2); END;
6skbuvu6fb9k0          3 call denis_proc(:"SYS_B_0")

SQL>
SQL>select * from t;

         A
----------
         1
         2
         3
         4
         5
         6

6 rows selected.

SQL>

Tuesday, August 03, 2010

Effect of a SQL* Plus environment variable: set pagesize 0

Last night, I troubleshooted a ksh shell script, in which a variable is assigned a value that is returned from SQL* Plus commands. I found that without 'set pagesize 0', the value returned had an undesirable leading space. Below is a test case that demonstrate this effect.


SQL> select col1, col3 from t;
 
      COL1 C
---------- -
         1 Y


$ cat test3.sh 
#!/bin/ksh 
 
RET1=` sqlplus -s  / << EOF
set  echo off head off feed off verify off term off  
select col3 from t
where  col1=1;
EOF
`
 
echo "RET1  >>>>"$RET1"<<<<"
if [[ $RET1 == 'Y' ]]; then
  echo "Yes"
else
  echo "No"
fi
 
RET2=` sqlplus -s  / << EOF
set echo off head off feed off verify off term off 
set pagesize 0
select col3 from t
where  col1=1;
EOF
`
 
echo "RET2  >>>>"$RET2"<<<<"
if [[ $RET2 == 'Y' ]]; then
  echo "Yes"
else
  echo "No"
fi


$ ./test3.sh 
RET1  >>>> Y<<<<
No
RET2  >>>>Y<<<<
Yes

Wednesday, May 26, 2010

Tuning by cardinality feedback and understanding of join cardinality and transitive closure

What comes out of my recent tuning experience of a 13-table join query is my deeper understanding of "Tuning by Cardinality Feedback", join cardinality and transitive closure. The problem for that 13-table join SQL is that Oracle predictes the join cardinality to be 1 always so it chooses nested loop join, whereas the hash join is able to bring the consistent gets from 400K to 6K.

First of all, let's see where the cardinality estimation is wrong. Below are the estimated cardinality from each operations in the first 4-table join: PACKAGE_MARKETS -> MARKETS -> PACKAGE_CATEGORY -> PACKAGE_PRODUCTS

10    9       NESTED LOOPS (Cost=10 Card=1 Bytes=522)
  11   10         NESTED LOOPS (Cost=9 Card=1 Bytes=325)
  12   11           NESTED LOOPS (Cost=4 Card=1 Bytes=229)
  13   12             NESTED LOOPS (Cost=3 Card=1 Bytes=224)
  14   13               NESTED LOOPS (Cost=2 Card=1 Bytes=104)
  15   14                 INDEX (UNIQUE SCAN) OF 'PK_PACKAGE_MARKETS' (UNIQUE) (Cost=1 Card=1 Bytes=12)
  16   14                 TABLE ACCESS (BY INDEX ROWID) OF 'MARKETS' (Cost=1 Card=1 Bytes=92)
  17   16                   INDEX (UNIQUE SCAN) OF 'PK_MARKETS' (UNIQUE)
  18   13               TABLE ACCESS (BY INDEX ROWID) OF 'PACKAGES' (Cost=1 Card=1 Bytes=120)
  19   18                 INDEX (UNIQUE SCAN) OF 'PK_PACKAGES' (UNIQUE)
  20   12             TABLE ACCESS (BY INDEX ROWID) OF 'PACKAGE_CATEGORY' (Cost=1 Card=1 Bytes=5)
  21   20               INDEX (UNIQUE SCAN) OF 'PK_PACKAGE_CATEGORY' (UNIQUE)
  22   11           TABLE ACCESS (BY INDEX ROWID) OF 'PACKAGE_PRODUCTS' (Cost=5 Card=1 Bytes=96)
  23   22             INDEX (RANGE SCAN) OF 'PK_PACKAGE_PRODUCTS' (UNIQUE) (Cost=2 Card=6)
  24   10         TABLE ACCESS (BY INDEX ROWID) OF 'PRODUCTS' (Cost=1 Card=1 Bytes=197


Below are the real cardinalities obtained from SQL_TRACE/TKPROF. we can see the real cardinality coming out of PACKAGE_PRODUCTS table is 2725 and the estimated one is 1 as shown above. This is where the whole execution plan becomes "wrong".

2725         NESTED LOOPS
      1          NESTED LOOPS
      1           NESTED LOOPS
      1            NESTED LOOPS
      1             INDEX UNIQUE SCAN PK_PACKAGE_MARKETS (object id 423961)
      1             TABLE ACCESS BY INDEX ROWID MARKETS
      1              INDEX UNIQUE SCAN PK_MARKETS (object id 423924)
      1            TABLE ACCESS BY INDEX ROWID PACKAGES
      1             INDEX UNIQUE SCAN PK_PACKAGES (object id 423954)
      1           TABLE ACCESS BY INDEX ROWID PACKAGE_CATEGORY
      1            INDEX UNIQUE SCAN PK_PACKAGE_CATEGORY (object id 423956)
   2725          TABLE ACCESS BY INDEX ROWID PACKAGE_PRODUCTS
   2811           INDEX RANGE SCAN PK_PACKAGE_PRODUCTS (object id 423963)

What I did is to add hints to force Oracle choose hash join instead of nested loop join for the remaining 9 tables.

This execersie also arouse my insterest to understand how join cardinality is calculated, thus I re-read Chapter 10 of Jonanth Lewis's Cost-Based Oracle Fundamentals. I found what he says is true ( as always :-)). Below is a test case output for a two table join SQL that involves "transitive closure" - there is a filter on the join column:
SQL>select * from v$version;

BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
PL/SQL Release 9.2.0.8.0 - Production
CORE 9.2.0.8.0 Production
TNS for Solaris: Version 9.2.0.8.0 - Production
NLSRTL Version 9.2.0.8.0 - Production

SQL>
SQL>-- from the count we know the acutual join cardinality should be 86
SQL>select count(*) from packages where package_id='Y9995';

  COUNT(*)
----------
         1

SQL>select count(*) from package_products where package_id='Y9995';

  COUNT(*)
----------
        86

SQL>
SQL>-- JL mentioned "query_reworte_enabled" has impact on join predicate removal
SQL>-- prior to 10g in hist book "Cost-Based Oracle Fundamentals"
SQL>-- I confirmed it is true in our 9i db. I also tested with rewrite/nowrite hints
SQL>-- instead of session level "query_rewrite_enabled parameter". No impact
SQL>-- from the hints.
SQL>
SQL>
SQL>show parameter query_rewrite_enabled

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
query_rewrite_enabled                string      TRUE
SQL>alter session set query_rewrite_enabled=true;

Session altered.

SQL>
SQL>
SQL>-- query_rewrite_enabled=true
SQL>explain plan for
  2  select  *
  3    from  packages a,   -- NROWS 2656
  4        package_products b   -- NROWS 284247
  5   where a.package_id   -- NDV  2656 PK
  6       = b.package_id   -- NDV  2610 NOT NULL
  7   and a.package_id = 'Y9995'
  8  ;

Explained.

SQL>select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------
| Id  | Operation                    |  Name                | Rows  | Bytes | Cost  |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                      |     1 |   240 |     5 |
|   1 |  NESTED LOOPS                |                      |     1 |   240 |     5 |
|   2 |   TABLE ACCESS BY INDEX ROWID| PACKAGES             |     1 |   144 |     2 |
|*  3 |    INDEX UNIQUE SCAN         | PK_PACKAGES          |     1 |       |     1 |
|   4 |   TABLE ACCESS BY INDEX ROWID| PACKAGE_PRODUCTS     |     1 |    96 |     3 |
|*  5 |    INDEX RANGE SCAN          | PK_PACKAGE_PRODUCTS  |     1 |       |     2 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("A"."PACKAGE_ID"='Y9995')
   5 - access("B"."PACKAGE_ID"='Y9995')
       filter("A"."PACKAGE_ID"="B"."PACKAGE_ID")

Note: cpu costing is off

20 rows selected.

SQL>
SQL>alter session set query_rewrite_enabled=false;

Session altered.

SQL>
SQL>-- query_rewrite_enabled=false
SQL>explain plan for
  2  select  *
  3    from  packages a,   -- NROWS 2656
  4        package_products b   -- NROWS 284247
  5   where a.package_id   -- NDV  2656 PK
  6       = b.package_id   -- NDV  2610 NOT NULL
  7   and a.package_id = 'Y9995'
  8  ;

Explained.

SQL>
SQL>select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------
| Id  | Operation                    |  Name                | Rows  | Bytes | Cost  |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                      |   109 | 26160 |    52 |
|   1 |  NESTED LOOPS                |                      |   109 | 26160 |    52 |
|   2 |   TABLE ACCESS BY INDEX ROWID| PACKAGES             |     1 |   144 |     2 |
|*  3 |    INDEX UNIQUE SCAN         | PK_PACKAGES          |     1 |       |     1 |
|   4 |   TABLE ACCESS BY INDEX ROWID| PACKAGE_PRODUCTS     |   109 | 10464 |    50 |
|*  5 |    INDEX RANGE SCAN          | PK_PACKAGE_PRODUCTS  |   109 |       |     2 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("A"."PACKAGE_ID"='Y9995')
   5 - access("B"."PACKAGE_ID"='Y9995')

Note: cpu costing is off

19 rows selected.


The test case showed two scenarios with same SQL and apparently obtained same execution plan, but
1. with query_rewrite_enabled=true, the estimated cardinality is 1
2. with query_rewrite_enabled=false, the estimated cardinality is 109
3. There are differences in the "Predicate Information" section of the execution plan. With query_rewrite_enabled=false, there is no join predicate

So how those cardinilities are calculated by Oracle? If we have two table join in the following form:

select *
  from t1, t2
 where t1.c1 = t2.c2
   and filter_predicates_of_t1
   and filter_predicates_of_t2;

The basic join cardinality formula is as follows (from "Cost-Based Oracle Fundamentals" p266):

Join Selectivity =
  ((num_rows(t1) - num_nulls(t1.c1)) / num_rows(t1)) *
  ((num_rows(t2) - num_nulls(t2.c2)) / num_rows(t2)) /
  greater(num_distinct(t1.c1), num_distinct(t2.c2))

Join Cardinality =
   join selectivity *
   filtered cardinality(t1) * filtered cardinality(t2)


In our scenario 1:

Join Selectivity = 
  (( 2656 -0) / 2656 ) *
  (( 284247 - 0) / 284247) /
  greater(2656, 2619)
  = 1/2656

Join Cardinaltiy =
   (1/2656) *
   (2656/2656) * ( 284247 /2610) 
   = 0.004 ( rounded to 1)

In our senario 2, notice we don't have join predicate actually, Oracle eliminated the Join Selectivity part:

Join Cardinlity =
   (2656/2656) * ( 284247 /2610)
   = 109


My production 9i database has set query_rewrite_enabled=true. So if I wrote the SQL without the join condition as follows::

select  *
 from  packages a,   -- NROWS 2656
       package_products b   -- NROWS 284247
where
      b.package_id = 'Y9995'
  and a.package_id = 'Y9995'
;


I can get the correct cardinality as in the scenaro 2.

BTW, for example, if a=b, and a=5; then b=5, this is called "transitive closure".

Friday, May 07, 2010

Play with my toy 10g RAC VII - Voting Disk

The voting disk is a shared raw disk partition or file on a clustered file system that is accessible to all nodes in the cluster. Its primary purpose is to help in situations where the private network communication fails. When that happens, the cluster is unable to have all nodes remain available because they are no longer able to synchronize I/O to the shared disks. Therefore, some of the nodes must go offline. The voting disk is then used to communicate the node state information used to determine which nodes go offline.


If we have 3 nodes,  presumably to the voting disk each node will write message as follows:


Node 1 writes : I can see Node 2 & 3
Node 2 writes : I can see Node 1 & 3
Node 3 writes : I can see Node 1 & 2


If for example Node 3's private network has problem, the message may become:


Node 1 writes : I can see Node 2 only
Node 2 writes : I can see Node 1 only
Node 3 writes : I can not see either Node 1 or Node 2 ( or it does not write anything)


In this situation, clearly Node 3 should be evicted from the cluster.




To avoid a single point of failure, we can multiplex voting disk. By design, if strictly more than half of the voting disks are up and contain consistent information, the cluster will be fine. That is to say if we have 5 voting disks, we can have at most 2 voting disk failures.
So the number_of_voting_disk = number_of_tolerable_disk_failure * 2 + 1.


This post is to document my test with the following task regarding voting disk administration:


Task - Recover from the lost of voting disks




1. Check the current voting disk configuration


[oracle@rac1 backup]$ crsctl query css votedisk
 0.     0    /dev/raw/raw6
 1.     0    /dev/raw/raw7
 2.     0    /dev/raw/raw8

located 3 votedisk(s).




2. Backup voting disk




[oracle@rac1 backup]$ dd if=/dev/raw/raw6 of=/home/oracle/backup/votingdisk_050710
80325+0 records in
80325+0 records out
[oracle@rac1 backup]$
[oracle@rac1 backup]$ ls -lhtr
total 40M
-rw-r--r--  1 oracle oinstall 40M May  7 16:23 votingdisk_050710



3. Wipe out the first voting disk


dd if=/dev/zero of=/dev/raw/raw6


Note: I have three voting disk files, in my understanding, the cluster should survive with 1 voting disk failure, however,rac1 and rac2 reboot right after I issue this command. I don't know why.


--------------- RAC 1 alert log  --------------------
 
Fri May  7 16:25:21 2010
Trace dumping is performing id=[cdmp_20100507162519]
Fri May  7 16:25:23 2010
Error: KGXGN aborts the instance (6)
Fri May  7 16:25:24 2010
Errors in file /u01/app/oracle/admin/devdb/bdump/devdb1_lmon_10476.trc:
ORA-29702: error occurred in Cluster Group Service operation
LMON: terminating instance due to error 29702


--------------- RAC 2 alert log --------------------
ri May  7 16:25:19 2010
Error: KGXGN aborts the instance (6)
Fri May  7 16:25:19 2010
Error: unexpected error (6) from the Cluster Service (LCK0)
Fri May  7 16:25:19 2010
Errors in file /u01/app/oracle/admin/devdb/bdump/devdb2_lmon_3150.trc:
ORA-29702: error occurred in Cluster Group Service operation
Fri May  7 16:25:19 2010
Errors in file /u01/app/oracle/admin/devdb/bdump/devdb2_lck0_3236.trc:
ORA-29702: error occurred in Cluster Group Service operation
Fri May  7 16:25:19 2010
LMON: terminating instance due to error 29702
Fri May  7 16:25:21 2010
System state dump is made for local instance
System State dumped to trace file /u01/app/oracle/admin/devdb/bdump/devdb2_diag_3146.trc
Fri May  7 16:31:01 2010





4. Restart CRS stack


[oracle@rac1 ~]$ sudo $ORA_CRS_HOME/bin/crsctl stop crs
Password:
Stopping resources.

Successfully stopped CRS resources
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
[oracle@rac1 ~]$
[oracle@rac1 ~]$ ssh rac2 sudo $ORA_CRS_HOME/bin/crsctl stop crs
Password:vz123ys

Stopping resources.
Successfully stopped CRS resources
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
[oracle@rac1 ~]$
[oracle@rac1 ~]$
[oracle@rac1 ~]$
[oracle@rac1 ~]$ ps -ef | grep d.bin
oracle   14672 30539  0 16:56 pts/1    00:00:00 grep d.bin
[oracle@rac1 ~]$ ssh rac2 ps -ef | grep d.bin
[oracle@rac1 ~]$ ./crsstat.sh
HA Resource                                   Target     State
-----------                                   ------     -----
error connecting to CRSD at [(ADDRESS=(PROTOCOL=ipc)(KEY=ora_crsqs))] clsccon 184

[oracle@rac1 ~]$ sudo $ORA_CRS_HOME/bin/crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly
[oracle@rac1 ~]$ ssh rac2 sudo $ORA_CRS_HOME/bin/crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly


[oracle@rac1 ~]$ ps -ef | grep d.bin
root     14242     1  0 16:54 ?        00:00:00 /u01/app/oracle/product/10.2.0/crs_1/bin/crsd.bin reboot
oracle   15219 14240  2 16:58 ?        00:00:00 /u01/app/oracle/product/10.2.0/crs_1/bin/evmd.bin
oracle   15383 15357  2 16:58 ?        00:00:00 /u01/app/oracle/product/10.2.0/crs_1/bin/ocssd.bin
oracle   15602 30539  0 16:58 pts/1    00:00:00 grep d.bin
[oracle@rac1 ~]$ ssh rac2 ps -ef | grep d.bin
root     23610     1  0 16:56 ?        00:00:00 /u01/app/oracle/product/10.2.0/crs_1/bin/crsd.bin reboot
oracle   24394 23609  2 16:58 ?        00:00:00 /u01/app/oracle/product/10.2.0/crs_1/bin/evmd.bin
oracle   24575 24549  2 16:58 ?        00:00:00 /u01/app/oracle/product/10.2.0/crs_1/bin/ocssd.bin
[oracle@rac1 ~]$ ./crsstat.sh
HA Resource                                   Target     State
-----------                                   ------     -----
ora.devdb.SLBA.cs                             OFFLINE    OFFLINE
ora.devdb.SLBA.devdb1.srv                     OFFLINE    OFFLINE
ora.devdb.SLBA.devdb2.srv                     OFFLINE    OFFLINE
ora.devdb.SNOLBA.cs                           OFFLINE    OFFLINE
ora.devdb.SNOLBA.devdb1.srv                   OFFLINE    OFFLINE
ora.devdb.SNOLBA.devdb2.srv                   OFFLINE    OFFLINE
ora.devdb.db                                  ONLINE     ONLINE on rac2
ora.devdb.devdb1.inst                         ONLINE     ONLINE on rac1
ora.devdb.devdb2.inst                         ONLINE     ONLINE on rac2
ora.rac1.ASM1.asm                             ONLINE     ONLINE on rac1
ora.rac1.LISTENER_RAC1.lsnr                   ONLINE     ONLINE on rac1
ora.rac1.gsd                                  ONLINE     ONLINE on rac1
ora.rac1.ons                                  ONLINE     ONLINE on rac1
ora.rac1.vip                                  ONLINE     ONLINE on rac1
ora.rac2.ASM2.asm                             ONLINE     ONLINE on rac2
ora.rac2.LISTENER_RAC2.lsnr                   ONLINE     ONLINE on rac2
ora.rac2.gsd                                  ONLINE     ONLINE on rac2
ora.rac2.ons                                  ONLINE     ONLINE on rac2
ora.rac2.vip                                  ONLINE     ONLINE on rac2






5. Check log


[oracle@rac1 rac1]$ tail alertrac1.log
2010-05-07 16:58:50.358
[cssd(15383)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 rac2 .
2010-05-07 16:58:54.722
[crsd(14242)]CRS-1201:CRSD started on node rac1.
2010-05-07 16:59:45.626
[cssd(15383)]CRS-1604:CSSD voting file is offline: /dev/raw/raw6. Details in /u01/app/oracle/product/10.2.0/crs_1 /log/rac1/cssd/ocssd.log.
2010-05-07 17:00:47.657
[cssd(15383)]CRS-1604:CSSD voting file is offline: /dev/raw/raw6. Details in /u01/app/oracle/product/10.2.0/crs_1 /log/rac1/cssd/ocssd.log.
2010-05-07 17:01:49.730
[cssd(15383)]CRS-1604:CSSD voting file is offline: /dev/raw/raw6. Details in /u01/app/oracle/product/10.2.0/crs_1 /log/rac1/cssd/ocssd.log.
[oracle@rac1 rac1]$ tail /u01/app/oracle/product/10.2.0/crs_1/log/rac1/cssd/ocssd.log
[    CSSD]2010-05-07 17:00:16.337 [132250528] >TRACE:   clssgmClientConnectMsg: Connect from con(0x8358600) proc( 0x8389ac8) pid() proto(10:2:1:1)
[    CSSD]2010-05-07 17:00:22.183 [132250528] >TRACE:   clssgmClientConnectMsg: Connect from con(0x8358600) proc( 0x838eb90) pid() proto(10:2:1:1)
[    CSSD]2010-05-07 17:00:30.776 [132250528] >TRACE:   clssgmClientConnectMsg: Connect from con(0x8358600) proc( 0x838ee90) pid() proto(10:2:1:1)
[    CSSD]2010-05-07 17:00:47.657 [62401440] >TRACE:   clssnmDiskStateChange: state from 3 to 3 disk (0//dev/raw/ raw6)
[    CSSD]2010-05-07 17:01:07.263 [132250528] >TRACE:   clssgmClientConnectMsg: Connect from con(0x8358600) proc( 0x837e6b8) pid() proto(10:2:1:1)
[    CSSD]2010-05-07 17:01:09.009 [132250528] >TRACE:   clssgmClientConnectMsg: Connect from con(0x8357dd8) proc( 0x8379340) pid() proto(10:2:1:1)
[    CSSD]2010-05-07 17:01:49.730 [62401440] >TRACE:   clssnmDiskStateChange: state from 3 to 3 disk (0//dev/raw/ raw6)
[    CSSD]2010-05-07 17:02:09.984 [132250528] >TRACE:   clssgmClientConnectMsg: Connect from con(0x835a580) proc( 0x8365a50) pid() proto(10:2:1:1)
[    CSSD]2010-05-07 17:02:51.784 [62401440] >TRACE:   clssnmDiskStateChange: state from 3 to 3 disk (0//dev/raw/ raw6)
[    CSSD]2010-05-07 17:03:12.292 [132250528] >TRACE:   clssgmClientConnectMsg: Connect from con(0x835a580) proc( 0x8365a50) pid() proto(10:2:1:1)



Note: similar message from rac2 alertrac2.log and ocssd.log. It can be seen with three voting disk files, if one of them unavailable, the RAC is still functioning.




6. Wipe out the second voting disk


dd if=/dev/zero of=/dev/raw/raw7


Two nodes reboot right after issuing above command. After reboot, only see evmd running:


[oracle@rac1 ~]$ ps -ef | grep d.bin
oracle    8139  6985  3 17:12 ?        00:00:14 /u01/app/oracle/product/10.2.0/crs_1/bin/evmd.binoracle   11075 10255  0 17:20 pts/1    00:00:00 grep d.bin




7. Check log:


--- alertrac1.log   show two voting disk files are offline

2010-05-07 17:12:14.926
[cssd(8303)]CRS-1604:CSSD voting file is offline: /dev/raw/raw6. Details in /u01/app/oracle/product/10.2.0/crs_1/log/rac1/cssd/ocssd.log.
2010-05-07 17:12:15.099
[cssd(8303)]CRS-1604:CSSD voting file is offline: /dev/raw/raw7. Details in /u01/app/oracle/product/10.2.0/crs_1/log/rac1/cssd/ocssd.log.
2010-05-07 17:12:15.147
[cssd(8303)]CRS-1605:CSSD voting file is online: /dev/raw/raw8. Details in /u01/app/oracle/product/10.2.0/crs_1/log/rac1/cssd/ocssd.log.
[oracle@rac1 rac1]$

[oracle@rac1 crsd]$ tail crsd.log

2010-05-07 17:14:04.532: [ COMMCRS][36494240]clsc_connect: (0x8655528) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac1_crs))

2010-05-07 17:14:04.532: [ CSSCLNT][3086931648]clsssInitNative: connect failed, rc 9

2010-05-07 17:14:04.533: [  CRSRTI][3086931648]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..

2010-05-07 17:14:05.536: [ CRSMAIN][3086931648][PANIC]0CRSD exiting: Could not init the CSS context

2010-05-07 17:14:05.540: [ default][3086931648]Terminating clsd session







8. Restore Voting Disk


[oracle@rac1 ~]$ dd if=/home/oracle/backup/votingdisk_050710 of=/dev/raw/raw6
80325+0 records in
80325+0 records out
[oracle@rac1 ~]$ dd if=/home/oracle/backup/votingdisk_050710 of=/dev/raw/raw7
dd: writing to `/dev/raw/raw7': No space left on device
80263+0 records in
80262+0 records out





9. Restart CRS


[oracle@rac1 ~]$ sudo $ORA_CRS_HOME/bin/crsctl start crs
Password:
Attempting to start CRS stack
The CRS stack will be started shortly
[oracle@rac1 ~]$ ssh rac2 sudo $ORA_CRS_HOME/bin/crsctl start crs
Password:vz123ys

Attempting to start CRS stack
The CRS stack will be started shortly


---- in alertrac1.log --------------------------

[oracle@rac1 rac1]$ pwd
/u01/app/oracle/product/10.2.0/crs_1/log/rac1
[oracle@rac1 rac1]$ tail -15 alertrac1.log
[cssd(8303)]CRS-1605:CSSD voting file is online: /dev/raw/raw8. Details in /u01/app/oracle/product/10.2.0/crs_1/log/rac1/cssd/ocssd.log.
2010-05-07 17:29:31.679
[cssd(13301)]CRS-1605:CSSD voting file is online: /dev/raw/raw6. Details in /u01/app/oracle/product/10.2.0/crs_1/log/rac1/cssd/ocssd.log.
2010-05-07 17:29:31.714
[cssd(13301)]CRS-1605:CSSD voting file is online: /dev/raw/raw7. Details in /u01/app/oracle/product/10.2.0/crs_1/log/rac1/cssd/ocssd.log.
2010-05-07 17:29:31.729
[cssd(13301)]CRS-1605:CSSD voting file is online: /dev/raw/raw8. Details in /u01/app/oracle/product/10.2.0/crs_1/log/rac1/cssd/ocssd.log.
2010-05-07 17:29:35.433
[cssd(13301)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 rac2 .
2010-05-07 17:29:38.247
[crsd(8910)]CRS-1012:The OCR service started on node rac1.
2010-05-07 17:29:38.287
[evmd(13364)]CRS-1401:EVMD started on node rac1.
2010-05-07 17:31:02.432
[crsd(8910)]CRS-1201:CRSD started on node rac1.

---- CTS resource are on-line

[oracle@rac1 ~]$ ~/crsstat.sh
HA Resource                                   Target     State
-----------                                   ------     -----
ora.devdb.SLBA.cs                             OFFLINE    OFFLINE
ora.devdb.SLBA.devdb1.srv                     OFFLINE    OFFLINE
ora.devdb.SLBA.devdb2.srv                     OFFLINE    OFFLINE
ora.devdb.SNOLBA.cs                           OFFLINE    OFFLINE
ora.devdb.SNOLBA.devdb1.srv                   OFFLINE    OFFLINE
ora.devdb.SNOLBA.devdb2.srv                   OFFLINE    OFFLINE
ora.devdb.db                                  ONLINE     ONLINE on rac2
ora.devdb.devdb1.inst                         ONLINE     ONLINE on rac1
ora.devdb.devdb2.inst                         ONLINE     ONLINE on rac2
ora.rac1.ASM1.asm                             ONLINE     ONLINE on rac1
ora.rac1.LISTENER_RAC1.lsnr                   ONLINE     ONLINE on rac1
ora.rac1.gsd                                  ONLINE     ONLINE on rac1
ora.rac1.ons                                  ONLINE     ONLINE on rac1
ora.rac1.vip                                  ONLINE     ONLINE on rac1
ora.rac2.ASM2.asm                             ONLINE     ONLINE on rac2
ora.rac2.LISTENER_RAC2.lsnr                   ONLINE     ONLINE on rac2
ora.rac2.gsd                                  ONLINE     ONLINE on rac2
ora.rac2.ons                                  ONLINE     ONLINE on rac2
ora.rac2.vip                                  ONLINE     ONLINE on rac2
[oracle@rac1 ~]$ date
Fri May  7 17:37:35 EDT 2010
 




In conclusion, with 3 voting disks, my test did show RAC can be operational if one of them is offline; if two of them are not available, then the CRS daemon can not start at all. However, when zero-out one of the voting disks, the server reboots, this is not desirable, not sure if this is due to my particular environment.

Tuesday, May 04, 2010

Play with my toy 10g RAC VI - Failed to Start CRS Processes due to OCR Out-of-Sync

Today is the first time I am trying to play with my toy RAC again since I have moved OCR and voting disk files to the raw devices last time as described here. To my surprise, I can not start CRS processes (i.e. those seen from ps -ef | grep d.bin) ! First of all, they should be started automatically after server reboots, but they did not. Secondly they did not start after I issued "crsctl start crs" or "/etc/init.d/init.crs start" either. It took me quite a while to find the right log that indicates the problem:

[oracle@rac1 client]$ pwd
/u01/app/oracle/product/10.2.0/crs_1/log/rac1/client
[oracle@rac1 client]$
[oracle@rac1 client]$ tail -10 css.log
2010-04-05 10:15:01.231: [ CSSCLNT][3086931648]clsssInitNative: connect failed, rc 9

2010-04-05 10:15:02.249: [ CSSCLNT][3086931648]clsssInitNative: connect failed, rc 9

2010-04-21 13:44:32.807: [ default][3068479168]prlsndmain: olsnodes successful!!
2010-04-28 11:13:01.999: [  OCRRAW][3086931648]propriogid:1: INVALID FORMAT
2010-04-28 12:02:36.960: [  OCRRAW][3086931648]propriogid:1: INVALID FORMAT
2010-05-04 10:58:31.850: [  OCRRAW][3068479168]propriogid:1: INVALID FORMAT
2010-05-04 10:58:31.932: [ default][3068479168]prlsndmain: olsnodes successful!!
2010-05-04 11:06:06.374: [  OCRRAW][3086931648]propriogid:1: INVALID FORMAT


It showed that OCR had problems. I then noticed "Device/File needs to be synchronized with the other device" in the ocrcheck output:

root@rac1:/u01/app/oracle/product/10.2.0/crs_1/log/rac1/client [devdb1]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     505928
         Used space (kbytes)      :       5372
         Available space (kbytes) :     500556
         ID                       :  645781380
         Device/File Name         : /dev/raw/raw4
                                    Device/File needs to be synchronized with the other device
         Device/File Name         : /dev/raw/raw5
                                    Device/File integrity check succeeded

         Cluster registry integrity check succeeded


I executed the following command:

# dd if=/dev/raw/raw5 of=/dev/raw/raw4

Then, I found ocrcheck was good again.
[oracle@rac1 ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     505928
         Used space (kbytes)      :       5348
         Available space (kbytes) :     500580
         ID                       :  219149972
         Device/File Name         : /dev/raw/raw4
                                    Device/File integrity check succeeded
         Device/File Name         : /dev/raw/raw5
                                    Device/File integrity check succeeded

         Cluster registry integrity check succeeded

I am not sure it is the right or best way to fix this problem, but it worked. BTW, I tried "ocrconfig -replace ocr /dev/raw/raw5", not working. After that, I was able to bring everything back to normal. Actually, the problem was there when I created ocr and ocrmirror on the raw devices, but I did not notice it at that time.

Friday, April 30, 2010

Execution Plan of a Subquery inside CASE statement

I found that in our 9.2.0.8 database the execution plan of a subquery inside CASE statement can not be seen in the SQL_TRACE/TKPROF report or from the excution plan extracted from the shared pool. However, it can be seen from Autotrace explain plan. I constructed a test case to demonstrate this as follows:

rem script: xplan_diff.sql
rem   execution plan for the subquery in the CASE statement not shown in SQL_TRACE/TKPROF in 9i
rem
rem

spool xplan_diff.log
set echo on

drop table t;
drop table t2;

create table t 
as 
select rownum id,
       'GOOD' val
from all_objects
where rownum <=20;
 
update t set val='BAD' where mod(id,2) = 0;
commit;

create table t2
as
select rownum id,
       'TEST' val
from all_objects
where rownum <=20; 


alter session set tracefile_identifier = test;
alter session set timed_statistics=true;
alter session set events '10046 trace name context forever, level 12';
set autotrace traceonly
select id,
       case when  val = 'GOOD' then 
  (select t2.val 
  from t, t2
  where t.id = t2.id
    and rownum = 1
         )
       else 'NO TEST'
       END  as IS_TEST
from t;
set autotrace off
alter session set events '10046 trace name context forever, level 12';
spool off
exit;


set doc off
doc

---- 9.2.0.8 Autotrace xplan  ---------------------------

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   COUNT (STOPKEY)
   2    1     MERGE JOIN
   3    2       SORT (JOIN)
   4    3         TABLE ACCESS (FULL) OF 'T2'
   5    2       SORT (JOIN)
   6    5         TABLE ACCESS (FULL) OF 'T'
   7    0   TABLE ACCESS (FULL) OF 'T'


----- 9.2.0.8  tkprof  -----------------------------------

Rows     Row Source Operation
-------  ---------------------------------------------------
     20  TABLE ACCESS FULL T
 
 

------ 10.2.0.4  Autotrace  xplan -------------------------------------------
----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |    20 |   380 |     2   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY      |      |       |       |            |          |
|*  2 |   HASH JOIN         |      |    20 |   640 |     5  (20)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| T    |    20 |   260 |     2   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| T2   |    20 |   380 |     2   (0)| 00:00:01 |
|   5 |  TABLE ACCESS FULL  | T    |    20 |   380 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(ROWNUM=1)
   2 - access("T"."ID"="T2"."ID")

------ 10.2.0.4  tkprof ------------
Rows     Row Source Operation
-------  ---------------------------------------------------
      1  COUNT STOPKEY (cr=6 pr=0 pw=0 time=1435 us)
      1   HASH JOIN  (cr=6 pr=0 pw=0 time=1417 us)
     20    TABLE ACCESS FULL T (cr=3 pr=0 pw=0 time=109 us)
      1    TABLE ACCESS FULL T2 (cr=3 pr=0 pw=0 time=64 us)
     20  TABLE ACCESS FULL T (cr=4 pr=0 pw=0 time=72 us)

#


Note: in 10.2.0.4 no such problem.


I was puzzled the other day about a similar sql to that in the above test case in our 9i  production database, which has two cursors with very different  sql_plan_hash_value and gets_per_exections when I checked it from v$sql. But when I pulled the execution plan out from v$sql_plan, the execution plan from both cursors are exactly same. Now I understand the subquery execution plan parts are missing. The different cost is due to the join order swapped in the subquery.

Wednesday, April 28, 2010

Play with my toy 10g RAC V - Move OCR and voting disk from OCFS2 to RAW device

My toy 10g RAC was set up based on Vincent Chan's article originally, in which OCFS2 filesystem was used for the shared OCR and votingdisk files.  In this post, I document steps I have taken to move them to RAW devices.

Steps:

1. Create a new shared virtual SCSI hard disk

a. Add a virtual disk (3GB) for RAC1

b. In the "Add Hardward Wizard" and "Hard Disk" windows for RAC2, Select the following option:
Use an Existing Virtual Disk - Choose this option to reuse or share a hard disk from another Virtual Machine

2. Modify virtual machine configuration file

Additional parameters are required to enable disk sharing between the two virtual RAC nodes. (Details see Vincent's article). Additonal reference is here.

3. Identify the device name for the disk

In RAC1 and RAC2, login as root, run command: fdisk -l, the following message can be seen:

Disk /dev/sdf: 3221 MB, 3221225472 bytes
255 heads, 63 sectors/track, 391 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdf doesn't contain a valid partition table


4. Create disk partitions
General reference about fdisk can be seen here.

At the end, I created partitions for /dev/sdf shown as follows:

Disk /dev/sdf: 3221 MB, 3221225472 bytes
255 heads, 63 sectors/track, 391 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdf1               1          63      506016   83  Linux
/dev/sdf2              64         126      506047+  83  Linux
/dev/sdf3             127         131       40162+  83  Linux
/dev/sdf4             132         391     2088450    5  Extended
/dev/sdf5             132         136       40131   83  Linux
/dev/sdf6             137         141       40131   83  Linux
/dev/sdf7             142         391     2008093+  83  Linux



Note: in reality, OCR, OCRMIRROR and each votingdisk file should be placed on seperate physical disks. Here I placed them in the same disk but different partitions. This is just for learning purpose.

5. Binding the partition
To map the raw devices to the shared partitions created, I did the following:

a. Edit /etc/sysconfig/rawdevices on both nodes, add the following:

/dev/raw/raw4 /dev/sdf1             
/dev/raw/raw5 /dev/sdf2             
/dev/raw/raw6 /dev/sdf3             
/dev/raw/raw7 /dev/sdf5             
/dev/raw/raw8 /dev/sdf6             
/dev/raw/raw9 /dev/sdf7             

b. Issue: /sbin/service rawdevices restart

root@rac1:/dev/raw [devdb1]# /sbin/service rawdevices restart
Assigning devices:
           /dev/raw/raw1  -->   /dev/sdc1
/dev/raw/raw1:  bound to major 8, minor 33
           /dev/raw/raw2  -->   /dev/sdd1
/dev/raw/raw2:  bound to major 8, minor 49
           /dev/raw/raw3  -->   /dev/sde1
/dev/raw/raw3:  bound to major 8, minor 65
           /dev/raw/raw4  -->   /dev/sdf1
/dev/raw/raw4:  bound to major 8, minor 81
           /dev/raw/raw5  -->   /dev/sdf2
/dev/raw/raw5:  bound to major 8, minor 82
           /dev/raw/raw6  -->   /dev/sdf3
/dev/raw/raw6:  bound to major 8, minor 83
           /dev/raw/raw7  -->   /dev/sdf5
/dev/raw/raw7:  bound to major 8, minor 85
           /dev/raw/raw8  -->   /dev/sdf6
/dev/raw/raw8:  bound to major 8, minor 86
           /dev/raw/raw9  -->   /dev/sdf7
/dev/raw/raw9:  bound to major 8, minor 87
done


Note: when issuing above command in RAC2, I recieve message: /dev/sdfn (n=1-7) is not a block device.
After reboot both node, everything is fine:

root@rac1:~ [devdb1]# raw -qa
/dev/raw/raw1:  bound to major 8, minor 33
/dev/raw/raw2:  bound to major 8, minor 49
/dev/raw/raw3:  bound to major 8, minor 65
/dev/raw/raw4:  bound to major 8, minor 81
/dev/raw/raw5:  bound to major 8, minor 82
/dev/raw/raw6:  bound to major 8, minor 83
/dev/raw/raw7:  bound to major 8, minor 85
/dev/raw/raw8:  bound to major 8, minor 86
/dev/raw/raw9:  bound to major 8, minor 87

[root@rac2 ~]# raw -qa
/dev/raw/raw1:  bound to major 8, minor 33
/dev/raw/raw2:  bound to major 8, minor 49
/dev/raw/raw3:  bound to major 8, minor 65
/dev/raw/raw4:  bound to major 8, minor 81
/dev/raw/raw5:  bound to major 8, minor 82
/dev/raw/raw6:  bound to major 8, minor 83
/dev/raw/raw7:  bound to major 8, minor 85
/dev/raw/raw8:  bound to major 8, minor 86
/dev/raw/raw9:  bound to major 8, minor 87

c. Edit /etc/udev/permissions.d/50-udev.permissions, add the following entries, and reboot

raw/raw1:oracle:dba:0660
raw/raw2:oracle:dba:0660
raw/raw3:oracle:dba:0660
raw/raw4:root:dba:0640
raw/raw5:root:dba:0640
raw/raw6:oracle:dba:0640
raw/raw7:oracle:dba:0640
raw/raw8:oracle:dba:0640
raw/raw9:oracle:dba:0660

Note: my intended use of raw devices is raw4 and raw5 for OCR and OCRMIRROR; raw6, raw7, raw8 for voting disk files.

6. Moving OCR file from OCFS to raw devices
a. shutdown CRS resource
b. Determine current OCR configuration:

[oracle@rac2 ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     262144
         Used space (kbytes)      :       5364
         Available space (kbytes) :     256780
         ID                       :  645781380
         Device/File Name         : /ocfs/clusterware/ocr
                                    Device/File integrity check succeeded

                                    Device/File not configured

         Cluster registry integrity check succeeded

c. Relocate ocr and add ocr mirror

root@rac1:~ [devdb1]# ocrconfig -replace ocrmirror /dev/raw/raw5
root@rac1:~ [devdb1]# ocrconfig -replace ocr /dev/raw/raw4

Note: if ocrmirror is not added first, the following errors will be seen:

root@rac1:~ [devdb1]# ocrconfig -replace ocr /dev/raw/raw4
PROT-16: Internal Error

d. Verify and start CRS resources

root@rac1:/ocfs/clusterware [devdb1]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     505928
         Used space (kbytes)      :       5364
         Available space (kbytes) :     500564
         ID                       :  645781380
         Device/File Name         : /dev/raw/raw4
                                    Device/File needs to be synchronized with the other device
         Device/File Name         : /dev/raw/raw5
                                    Device/File integrity check succeeded

         Cluster registry integrity check succeeded

[May 4,2010, update: I did not notice "Device/File needs to be synchronized  ..." in the above ocrcheck output on Apr 28. CRS processes could be started despite of this. But today I have to fix this to get CRS started. see http://oracle-study-notes.blogspot.com/2010/05/play-with-my-toy-10g-rac-vi-failed-to.html ]

7. Moving voting disk files from OCFS to raw devices

a. Determine the current voting disk location

oracle@rac1:~ [devdb1]$ crsctl query css votedisk
 0.     0    /ocfs/clusterware/votingdisk

located 1 votedisk(s).

b. Shutdown Oracle Clusterware on both node;

oracle@rac1:~ [devdb1]$ sudo $ORA_CRS_HOME/bin/crsctl stop crs
Password:
Stopping resources.
Successfully stopped CRS resources
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
oracle@rac1:~ [devdb1]$ ssh rac2 sudo $ORA_CRS_HOME/bin/crsctl stop crs
Password:vz123ys

Stopping resources.
Successfully stopped CRS resources
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.


c. Take a backup of all voting disk:
oracle@rac1:~ [devdb1]$ dd if=/ocfs/clusterware/votingdisk of=/home/oracle/backup/votingdisk_042810
20000+0 records in
20000+0 records out

d. Move

root@rac1:~ [devdb1]# crsctl delete votedisk css /ocfs/clusterware/votingdisk -force
Unexpected parameter: votedisk
root@rac1:~ [devdb1]# crsctl delete css votedisk /ocfs/clusterware/votingdisk -force
only one votedisk /ocfs/clusterware/votingdisk configured.
please add another disk before removing the last one.
root@rac1:~ [devdb1]# crsctl add  css votedisk  /dev/raw/raw7 -force
Now formatting voting disk: /dev/raw/raw7
successful addition of votedisk /dev/raw/raw7.
root@rac1:~ [devdb1]# crsctl delete css votedisk /ocfs/clusterware/votingdisk -force
successful deletion of votedisk /ocfs/clusterware/votingdisk.
root@rac1:~ [devdb1]# crsctl add  css votedisk  /dev/raw/raw6 -force
Now formatting voting disk: /dev/raw/raw6
successful addition of votedisk /dev/raw/raw6.
root@rac1:~ [devdb1]# crsctl add  css votedisk  /dev/raw/raw8 -force
Now formatting voting disk: /dev/raw/raw8
successful addition of votedisk /dev/raw/raw8.
root@rac1:~ [devdb1]# crsctl query css votedisk
 0.     0    /dev/raw/raw6
 1.     0    /dev/raw/raw7
 2.     0    /dev/raw/raw8

located 3 votedisk(s).


e. Start Oracle Clusterware and verify everything is ok

oracle@rac1:~ [devdb1]$ sudo $ORA_CRS_HOME/bin/crsctl start crs
Password:
Attempting to start CRS stack
The CRS stack will be started shortly
oracle@rac1:~ [devdb1]$ ssh rac2 sudo $ORA_CRS_HOME/bin/crsctl start crs
Password:vz123ys

Attempting to start CRS stack
The CRS stack will be started shortly

--- Wait for several min here  -----------

oracle@rac1:~ [devdb1]$
oracle@rac1:~ [devdb1]$ ./crs_rep.sh
HA Resource                                   Target     State
-----------                                   ------     -----
ora.devdb.SLBA.cs                             ONLINE     ONLINE on rac1
ora.devdb.SLBA.devdb1.srv                     ONLINE     ONLINE on rac1
ora.devdb.SLBA.devdb2.srv                     ONLINE     ONLINE on rac2
ora.devdb.SNOLBA.cs                           ONLINE     ONLINE on rac1
ora.devdb.SNOLBA.devdb1.srv                   ONLINE     ONLINE on rac1
ora.devdb.SNOLBA.devdb2.srv                   ONLINE     ONLINE on rac2
ora.devdb.db                                  ONLINE     ONLINE on rac2
ora.devdb.devdb1.inst                         ONLINE     ONLINE on rac1
ora.devdb.devdb2.inst                         ONLINE     ONLINE on rac2
ora.rac1.ASM1.asm                             ONLINE     ONLINE on rac1
ora.rac1.LISTENER_RAC1.lsnr                   ONLINE     ONLINE on rac1
ora.rac1.gsd                                  ONLINE     ONLINE on rac1
ora.rac1.ons                                  ONLINE     ONLINE on rac1
ora.rac1.vip                                  ONLINE     ONLINE on rac1
ora.rac2.ASM2.asm                             ONLINE     ONLINE on rac2
ora.rac2.LISTENER_RAC2.lsnr                   ONLINE     ONLINE on rac2
ora.rac2.gsd                                  ONLINE     ONLINE on rac2
ora.rac2.ons                                  ONLINE     ONLINE on rac2
ora.rac2.vip                                  ONLINE     ONLINE on rac2
oracle@rac1:~ [devdb1]$ crsctl query css votingdisk
Unexpected parameter: votingdisk
oracle@rac1:~ [devdb1]$ crsctl query css votedisk
 0.     0    /dev/raw/raw6
 1.     0    /dev/raw/raw7
 2.     0    /dev/raw/raw8

located 3 votedisk(s).

Wednesday, April 21, 2010

Play with my toy 10g RAC IV - Test two OCR adminstration tasks

The OCR contains information about the cluster node list, instance-to-node mapping information, and information about Oracle Clusterware resource profiles for applications that may have been customized

Task 1 - Mirror the OCR

Oracle RAC environments do not support more than two OCRs, at most a primary OCR and a second OCR. As my OCR is on the OCFS2 filesystem, I need to first create a new OCR file to complete the task of mirroring the OCR

1). Verify I don't have ocrmirror:

root@rac1:~ [devdb1]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     262144
         Used space (kbytes)      :       5348
         Available space (kbytes) :     256796
         ID                       :  645781380
         Device/File Name         : /ocfs/clusterware/ocr
                                    Device/File integrity check succeeded

                                    Device/File not configured

         Cluster registry integrity check succeeded

2) Create a new OCR file:

dd if=/dev/zero of=/ocfs/clusterware/ocrmirror.dbf bs=1M count=128

root@rac1:/ocfs/clusterware [devdb1]# dd if=/dev/zero of=/ocfs/clusterware/ocrmirror.dbf bs=1M count=128
128+0 records in
128+0 records out
root@rac1:/ocfs/clusterware [devdb1]# ls -lh
total 144M
-rw-r-----  1 root   oinstall 5.5M Apr 15 10:15 ocr
-rw-r--r--  1 root   root     128M Apr 20 16:10 ocrmirror.dbf
-rw-r--r--  1 oracle oinstall 9.8M Apr 20 16:10 votingdisk


3) Add the OCRMIRROR

root@rac1:/ocfs/clusterware [devdb1]# ocrconfig -replace ocrmirror /ocfs/clusterware/ocrmirror.dbf

root@rac1:/ocfs/clusterware [devdb1]# ls -lhtr
total 397M
-rw-r-----  1 root   oinstall 5.5M Apr 15 10:15 ocr
-rw-r--r--  1 root   root     381M Apr 20 16:13 ocrmirror.dbf
-rw-r--r--  1 oracle oinstall 9.8M Apr 20 16:14 votingdisk


Note: the size of ocrmirror.dbf becomes 381M, this is quite unusal.


4) Verify:

root@rac1:/ocfs/clusterware [devdb1]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     262144
         Used space (kbytes)      :       5348
         Available space (kbytes) :     256796
         ID                       :  645781380
         Device/File Name         : /ocfs/clusterware/ocr
                                    Device/File integrity check succeeded
         Device/File Name         : /ocfs/clusterware/ocrmirror.dbf
                                    Device/File integrity check succeeded

         Cluster registry integrity check succeeded


5) Remove OCRMRROR if we want

root@rac1:/ocfs/clusterware [devdb1]# ocrconfig -replace ocrmirror
root@rac1:/ocfs/clusterware [devdb1]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     262144
         Used space (kbytes)      :       5348
         Available space (kbytes) :     256796
         ID                       :  645781380
         Device/File Name         : /ocfs/clusterware/ocr
                                    Device/File integrity check succeeded

                                    Device/File not configured

         Cluster registry integrity check succeeded

root@rac1:/ocfs/clusterware [devdb1]# rm ocrmirror.dbf
rm: remove regular file `ocrmirror.dbf'? yes
root@rac1:/ocfs/clusterware [devdb1]#
root@rac1:/ocfs/clusterware [devdb1]# df -h .
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdb1             512M   90M  423M  18% /ocfs




Task 2 - Backup and restore an OCR file

1) For security purpose, generate a logical OCR backup file

[oracle@rac1 ~]$ sudo /u01/app/oracle/product/10.2.0/crs_1/bin/ocrconfig -export ./backup/logicalocrbak
Password:
[oracle@rac1 ~]$ ls -lh ./backup/
total 128K
-rw-r--r--  1 root root 122K Apr 21 15:16 logicalocrbak


2) Delete the OCR file to simulate lost of OCR

root@rac1:/ocfs/clusterware [devdb1]# ls -lhtr
total 138M
-rw-r--r--  1 root   root     128M Apr 21 15:17 ocr
-rw-r--r--  1 oracle oinstall 9.8M Apr 21 15:17 votingdisk
root@rac1:/ocfs/clusterware [devdb1]# rm -f ocr
root@rac1:/ocfs/clusterware [devdb1]# ls -lhtr
total 9.8M
-rw-r--r--  1 oracle oinstall 9.8M Apr 21 15:17 votingdisk

[oracle@rac1 ~]$ ocrcheck
PROT-602: Failed to retrieve data from the cluster registry


3) Locate a physical backup of OCR

[oracle@rac1 ~]$ ocrconfig -showbackup

rac2     2010/04/20 17:58:54     /u01/app/oracle/product/10.2.0/crs_1/cdata/crs

rac2     2010/04/19 19:47:53     /u01/app/oracle/product/10.2.0/crs_1/cdata/crs

rac2     2010/04/19 15:47:51     /u01/app/oracle/product/10.2.0/crs_1/cdata/crs

rac2     2010/04/19 11:47:47     /u01/app/oracle/product/10.2.0/crs_1/cdata/crs

rac2     2010/04/19 15:47:51     /u01/app/oracle/product/10.2.0/crs_1/cdata/crs
[oracle@rac1 ~]$ ssh rac2 ls -lhtr /u01/app/oracle/product/10.2.0/crs_1/cdata/crs
total 37M
-rw-r--r--  1 root root 4.6M Apr 12 21:14 week_.ocr
-rw-r--r--  1 root root 5.4M Apr 19 11:47 day.ocr
-rw-r--r--  1 root root 5.4M Apr 19 15:47 backup02.ocr
-rw-r--r--  1 root root 5.4M Apr 19 15:47 week.ocr
-rw-r--r--  1 root root 5.4M Apr 19 19:47 backup01.ocr
-rw-r--r--  1 root root 5.4M Apr 20 17:58 backup00.ocr
-rw-r--r--  1 root root 5.4M Apr 20 17:58 day_.ocr


Note: The Oracle Clusterware automatically creates OCR backups every four hours. Default location: CRS_home/cdata/cluster_name

4) Stop CRS resources on both nodes

[oracle@rac1 ~]$ crs_stop -all
Attempting to stop `ora.devdb.SNOLBA.devdb1.srv` on member `rac1`
Attempting to stop `ora.devdb.SLBA.devdb1.srv` on member `rac1`
Attempting to stop `ora.devdb.SNOLBA.cs` on member `rac1`
Attempting to stop `ora.rac1.gsd` on member `rac1`
Attempting to stop `ora.devdb.SLBA.cs` on member `rac1`
Attempting to stop `ora.devdb.SLBA.devdb2.srv` on member `rac2`
Attempting to stop `ora.devdb.SNOLBA.devdb2.srv` on member `rac2`
Stop of `ora.rac1.gsd` on member `rac1` succeeded.
Stop of `ora.devdb.SLBA.devdb2.srv` on member `rac2` succeeded.
Stop of `ora.devdb.SLBA.devdb1.srv` on member `rac1` succeeded.
Stop of `ora.devdb.SNOLBA.devdb2.srv` on member `rac2` succeeded.
Stop of `ora.devdb.SNOLBA.devdb1.srv` on member `rac1` succeeded.
Attempting to stop `ora.rac1.ons` on member `rac1`
Attempting to stop `ora.rac2.gsd` on member `rac2`
Attempting to stop `ora.rac2.ons` on member `rac2`
Stop of `ora.rac2.gsd` on member `rac2` succeeded.
Attempting to stop `ora.devdb.db` on member `rac2`
Stop of `ora.rac1.ons` on member `rac1` succeeded.
Stop of `ora.rac2.ons` on member `rac2` succeeded.
Stop of `ora.devdb.SLBA.cs` on member `rac1` succeeded.
Stop of `ora.devdb.SNOLBA.cs` on member `rac1` succeeded.
Stop of `ora.devdb.db` on member `rac2` succeeded.
Attempting to stop `ora.rac1.LISTENER_RAC1.lsnr` on member `rac1`
Attempting to stop `ora.rac2.LISTENER_RAC2.lsnr` on member `rac2`
Stop of `ora.rac1.LISTENER_RAC1.lsnr` on member `rac1` succeeded.
`ora.devdb.devdb1.inst` is already OFFLINE.
Attempting to stop `ora.rac1.ASM1.asm` on member `rac1`
Stop of `ora.rac2.LISTENER_RAC2.lsnr` on member `rac2` succeeded.
`ora.devdb.devdb2.inst` is already OFFLINE.
Attempting to stop `ora.rac2.ASM2.asm` on member `rac2`
Stop of `ora.rac1.ASM1.asm` on member `rac1` succeeded.
Attempting to stop `ora.rac1.vip` on member `rac1`
Stop of `ora.rac2.ASM2.asm` on member `rac2` succeeded.
Stop of `ora.rac1.vip` on member `rac1` succeeded.
Attempting to stop `ora.rac2.vip` on member `rac2`
Stop of `ora.rac2.vip` on member `rac2` succeeded.
CRS-0216: Could not stop resource 'ora.devdb.devdb1.inst'.

CRS-0216: Could not stop resource 'ora.devdb.devdb2.inst'.

[oracle@rac1 ~]$ ./crs_rep.sh
HA Resource                                   Target     State
-----------                                   ------     -----
ora.devdb.SLBA.cs                             OFFLINE    OFFLINE
ora.devdb.SLBA.devdb1.srv                     OFFLINE    OFFLINE
ora.devdb.SLBA.devdb2.srv                     OFFLINE    OFFLINE
ora.devdb.SNOLBA.cs                           OFFLINE    OFFLINE
ora.devdb.SNOLBA.devdb1.srv                   OFFLINE    OFFLINE
ora.devdb.SNOLBA.devdb2.srv                   OFFLINE    OFFLINE
ora.devdb.db                                  OFFLINE    OFFLINE
ora.devdb.devdb1.inst                         OFFLINE    OFFLINE
ora.devdb.devdb2.inst                         OFFLINE    OFFLINE
ora.rac1.ASM1.asm                             OFFLINE    OFFLINE
ora.rac1.LISTENER_RAC1.lsnr                   OFFLINE    OFFLINE
ora.rac1.gsd                                  OFFLINE    OFFLINE
ora.rac1.ons                                  OFFLINE    OFFLINE
ora.rac1.vip                                  OFFLINE    OFFLINE
ora.rac2.ASM2.asm                             OFFLINE    OFFLINE
ora.rac2.LISTENER_RAC2.lsnr                   OFFLINE    OFFLINE
ora.rac2.gsd                                  OFFLINE    OFFLINE
ora.rac2.ons                                  OFFLINE    OFFLINE
ora.rac2.vip                                  OFFLINE    OFFLINE



5) Stop CRS on both node

The following command has failed due to OCR was deleted:

[oracle@rac1 ~]$ sudo $ORA_CRS_HOME/bin/crsctl stop crs
Password:
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]

Note: normally we should see:

[oracle@rac1 ~]$ ssh rac2 sudo $ORA_CRS_HOME/bin/crsctl stop crs
Stopping resources.
Successfully stopped CRS resources
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.


Tried to kill ocssd.bin processes on both nodes at OS level. Server reboot automatically after they are killed.
After reboot, no CRS procesess are running.

[oracle@rac1 ~]$ ps -ef | grep d.bin
oracle    8887  8391  0 15:36 pts/1    00:00:00 grep d.bin
[oracle@rac1 ~]$ ssh rac2 ps -ef | grep d.bin
[oracle@rac1 ~]$



6) Restore OCR using the backup identified at step 3

Due to my OCR resides on OCFS2 filesystem, I need to create the file at first.

root@rac1:/ocfs/clusterware [devdb1]# dd if=/dev/zero of=/ocfs/clusterware/ocr bs=1M count=128
128+0 records in
128+0 records out
root@rac1:/ocfs/clusterware [devdb1]# ls -lhtr
total 138M
-rw-r--r--  1 oracle oinstall 9.8M Apr  5 10:17 votingdisk
-rw-r--r--  1 root   root     128M Apr 21 15:41 ocr


Then I can do a restore from backup:


[oracle@rac1 clusterware]$ ssh rac2 sudo $ORA_CRS_HOME/bin/ocrconfig -restore /u01/app/oracle/product/10.2.0/crs_1/cdata/crs/backup00.ocr
Password:vz123ys




[oracle@rac1 ~]$ cluvfy comp ocr -n all -verbose

Verifying OCR integrity

Checking OCR integrity...

Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations.

Uniqueness check for OCR device passed.

Checking the version of OCR...
OCR of correct Version "2" exists.

Checking data integrity of OCR...
Data integrity check for OCR passed.

OCR integrity check passed.

Verification of OCR integrity was successful.



Note: the syntax for restore from logical backup is: ocrconfig -import /home/oracle/backup/ocrlogicbackup

7) Restart CRS on both nodes

[oracle@rac1 ~]$ sudo $ORA_CRS_HOME/bin/crsctl start crs
Password:
Attempting to start CRS stack
The CRS stack will be started shortly
[oracle@rac1 ~]$ ssh rac2 sudo $ORA_CRS_HOME/bin/crsctl start crs
Password:vz123ys

Attempting to start CRS stack
The CRS stack will be started shortly

[oracle@rac1 ~]$ ps -ef | grep d.bin
root      6591     1  0 15:29 ?        00:00:04 /u01/app/oracle/product/10.2.0/crs_1/bin/crsd.bin reboot
oracle    9941  6589  0 15:45 ?        00:00:01 /u01/app/oracle/product/10.2.0/crs_1/bin/evmd.bin
oracle   10071 10045  0 15:45 ?        00:00:01 /u01/app/oracle/product/10.2.0/crs_1/bin/ocssd.bin
oracle   20444  8391  0 15:51 pts/1    00:00:00 grep d.bin



After serveral trials using SRVCTL to shutdown/bring up individual commponent and manually shutdown ASM and RAC2 instances, finally I got:

[oracle@rac1 ~]$ ./crs_rep.sh
HA Resource                                   Target     State
-----------                                   ------     -----
ora.devdb.SLBA.cs                             ONLINE     ONLINE on rac1
ora.devdb.SLBA.devdb1.srv                     ONLINE     ONLINE on rac1
ora.devdb.SLBA.devdb2.srv                     ONLINE     ONLINE on rac2
ora.devdb.SNOLBA.cs                           ONLINE     ONLINE on rac1
ora.devdb.SNOLBA.devdb1.srv                   ONLINE     ONLINE on rac1
ora.devdb.SNOLBA.devdb2.srv                   ONLINE     ONLINE on rac2
ora.devdb.db                                  ONLINE     ONLINE on rac2
ora.devdb.devdb1.inst                         ONLINE     ONLINE on rac1
ora.devdb.devdb2.inst                         ONLINE     ONLINE on rac2
ora.rac1.ASM1.asm                             ONLINE     ONLINE on rac1
ora.rac1.LISTENER_RAC1.lsnr                   ONLINE     ONLINE on rac1
ora.rac1.gsd                                  ONLINE     ONLINE on rac1
ora.rac1.ons                                  ONLINE     ONLINE on rac1
ora.rac1.vip                                  ONLINE     ONLINE on rac1
ora.rac2.ASM2.asm                             ONLINE     ONLINE on rac2
ora.rac2.LISTENER_RAC2.lsnr                   ONLINE     ONLINE on rac2
ora.rac2.gsd                                  ONLINE     ONLINE on rac2
ora.rac2.ons                                  ONLINE     ONLINE on rac2
ora.rac2.vip                                  ONLINE     ONLINE on rac2
[oracle@rac1 ~]$



Ref:
(1) http://download.oracle.com/docs/cd/B19306_01/rac.102/b14197/votocr.htm#BABIHADG
(2) OCR / Vote disk Maintenance Operations: (ADD/REMOVE/REPLACE/MOVE), including moving from RAW Devices to Block Devices. [MOS ID 428681.1]
(3) 'ocrconfig -replace ocr' Fails With PROT-16 [MOS ID 444757.1]

Friday, April 16, 2010

Play with my toy 10g RAC III - Demonstrate Load Balancing Advisory

By design, RAC involves multiple instances accessing the same database. By analogy, a database can be thought of as an island, which is accessible by multiple brigdes (i.e. instances). If there were two brigdes and 10 incoming cars that are being driven to the island, and if the 10 cars are directed randomly toward one of the bridges,  each bridge will be added 5 cars roughly, this is so-called Client-side connect-time load balancing . That is to say regardless of how busy a brigde is, how many cars already on that brigde, and how much longer to travel through that bridge etc, each brige is treated equally. We can easily imagine that if we have a mechanizm to monitor the condition of each brigde and direct the incoming cars based on the current traffic conditions, we will be much better in providing the service. Such a mechanizm in Oracle RAC is called Load balancing Advisory, which can be used in the Server-side connect-time load balancing.  Both client-side and server-side load balancings are applied to the time when a physical connection to a database is initially opened. Since opening a connection is an expensive operation, ideally application should be able to open a connection once, then reuse it for as many times as needed. There is a function called connection pooling at the application server layer, which establishs physical connections once and never close them. With respect to connection pools, there is a feature called "Runtime Connection Load Balancing", esstially it allocates logical connections among the physical connections in the connection pool at the transaction level instead of load balancing at the time of the initial database connection.

In this post, I will describe a test which is to demonstrate server-side connect-time load balancing with load balancing advisory feature. The idea used in this test was inspired from Jim Czuprynski's article , as well as in his Oracle University RAC class, which I attended about two and half years ago.

First of all, a "job" in this test is defined as a workload that involves:

(1) connect to the database as user "DENIS"
(2) do an insert and  an delete to a table 1000 times
(3) disconnect
(4) repeat (1) to (3) during predefined time period, e.g 20 min.

So if I submit 100 such jobs, I would expect 100 sessions count by username "DENIS"

Secondly, I created a service called 'SLBA', which is used for the job to connect to the database. The details about how to create a LBA-enabled service can be seen in Jim's article or Oracle documentation.

The steps are as follows:

1. Query GV$SERVICEMEATIC for instance goodness before starting any jobs


12:39:59 system@DEVDB> @servicemetric.sql SLBA

Service                         Inst    Good
Name     Start Time End Time      ID    ness   DELTA
-------- ---------- ---------- ----- ------- -------
SLBA     12:40:00   12:40:05       2     100     100
         12:39:58   12:40:01       1     100     100
         12:38:58   12:39:58       1     100     100
         12:38:53   12:39:53       2     100     100


Note: In Oracle 10.2 and above, the GOODNESS indicates how attractive the instance is with respect to processing the workload presented to the database service. Lower numbers are preferred. The umber is internally calculated based on any goal (long or short) that was specified for that database service. In Oracle 10.2 and above, the DELTA column predicts how much GOODNESS will increase for every
additional session that connects to the instance.


2. Start 10 jobs using SERVICE SLBA at RAC2
[oracle@rac2 ~]$ startSLBAload.sh 20


12:41:18 system@DEVDB> @ct
count # of session by username DENIS

       Inst
SYSDATE                ID   COUNT(*)
------------------- ----- ----------
2010-04-16 12:41:27     1          3
2010-04-16 12:41:27     2          7


12:42:30 system@DEVDB> @servicemetric.sql SLBA

Service                         Inst    Good
Name     Start Time End Time      ID    ness   DELTA
-------- ---------- ---------- ----- ------- -------
SLBA     12:42:39   12:42:45       2     416    9998
         12:42:33   12:42:38       1     384    9844
         12:40:58   12:41:58       1     384    9844
         12:40:53   12:41:54       2     416    9998


3. Start CPU LOAD at RAC2

[oracle@rac2 ~]$ cpu_load.sh &
[1] 8882
[oracle@rac2 ~]$ ps -ef | grep zip
oracle    8883  8882 16 12:43 pts/1    00:00:01 gzip video1.mpg
oracle    8884  8882  8 12:43 pts/1    00:00:00 gzip video2.mpg
Note: I created two gzip/guzip jobs at RAC2 to increase CPU utilization at RAC2.


4. Query GV$SERVICEMEATIC for instance goodness

12:44:46 system@DEVDB> @servicemetric.sql SLBA

Service                         Inst    Good
Name     Start Time End Time      ID    ness   DELTA
-------- ---------- ---------- ----- ------- -------
SLBA     12:44:48   12:44:54       2    1666    9998
         12:44:43   12:44:49       1     500    9994
         12:43:54   12:44:54       2    1666    9998
         12:42:58   12:43:58       1     500    9994


Note: it is expected to see "Goodness" increasing in RAC2, compared to RAC1, RAC2 should be less attractive to the incomming connections.


5. Start 10 jobs using SERVICE SLBA at RAC2 - total 20 jobs

12:45:55 system@DEVDB> @ct
count # of session by username DENIS

                     Inst
SYSDATE                ID   COUNT(*)
------------------- ----- ----------
2010-04-16 12:46:11     1          7
2010-04-16 12:46:11     2         13

12:46:03 system@DEVDB> @servicemetric.sql SLBA

Service                         Inst    Good
Name     Start Time End Time      ID    ness   DELTA
-------- ---------- ---------- ----- ------- -------
SLBA     12:46:10   12:46:14       2   12500    9998
         12:46:03   12:46:07       1     588    9998
         12:44:57   12:45:58       1     588    9998
         12:44:54   12:45:54       2   12500    9998


6. Start 10 jobs using SERVICE SLBA at RAC2 - total 30 jobs

12:47:52 system@DEVDB> @ct
count # of session by username DENIS

                     Inst
SYSDATE                ID   COUNT(*)
------------------- ----- ----------
2010-04-16 12:48:06     1         15
2010-04-16 12:48:06     2         15

12:47:58 system@DEVDB> @servicemetric.sql SLBA

Service                         Inst    Good
Name     Start Time End Time      ID    ness   DELTA
-------- ---------- ---------- ----- ------- -------
SLBA     12:48:05   12:48:10       2   12500    9998
         12:47:43   12:47:48       1     750    9998
         12:46:54   12:47:55       2   12500    9998
         12:45:58   12:46:58       1     750    9998


7. Start 10 jobs using SERVICE SLBA at RAC2 - total 40 jobs

12:50:14 system@DEVDB> @servicemetric.sql SLBA

Service                         Inst    Good
Name     Start Time End Time      ID    ness   DELTA
-------- ---------- ---------- ----- ------- -------
SLBA     12:50:19   12:50:22       1    1428    9998
         12:50:19   12:50:23       2   12500    9998
         12:48:58   12:49:58       1    1428    9998
         12:48:55   12:49:54       2   12500    9998

12:50:21 system@DEVDB> @ct
count # of session by username DENIS

                     Inst
SYSDATE                ID   COUNT(*)
------------------- ----- ----------
2010-04-16 12:50:39     1         15
2010-04-16 12:50:39     2         25


12:51:23 system@DEVDB> @ct
count # of session by username DENIS


8. Start 10 jobs using SERVICE SLBA at RAC2 - total 50 jobs

Inst
SYSDATE                ID   COUNT(*)
------------------- ----- ----------
2010-04-16 12:51:37     1         26
2010-04-16 12:51:37     2         24

12:51:30 system@DEVDB> @servicemetric.sql SLBA

Service                         Inst    Good
Name     Start Time End Time      ID    ness   DELTA
-------- ---------- ---------- ----- ------- -------
SLBA     12:51:40   12:51:43       2   12500    9998
         12:51:28   12:51:33       1    2500    9998
         12:49:58   12:50:58       1    2500    9998
         12:49:54   12:50:55       2   12500    9998

Note: We started to see # of session in RAC1 greater than that in RAC2


8. Start 10 jobs using SERVICE SLBA at RAC2 - total 60 jobs

Inst
SYSDATE                ID   COUNT(*)
------------------- ----- ----------
2010-04-16 12:53:15     1         34
2010-04-16 12:53:15     2         26

12:53:09 system@DEVDB> @servicemetric.sql SLBA

Service                         Inst    Good
Name     Start Time End Time      ID    ness   DELTA
-------- ---------- ---------- ----- ------- -------
SLBA     12:53:15   12:53:20       2   12500    9998
         12:53:08   12:53:13       1    1700    9998
         12:51:58   12:52:58       1    1700    9998
         12:51:55   12:52:55       2   12500    9998


9. Start 10 jobs using SERVICE SLBA at RAC2 - total 70 jobs

count # of session by username DENIS

                     Inst
SYSDATE                ID   COUNT(*)
------------------- ----- ----------
2010-04-16 12:54:23     1         42
2010-04-16 12:54:23     2         28

12:54:15 system@DEVDB> @servicemetric.sql SLBA

Service                         Inst    Good
Name     Start Time End Time      ID    ness   DELTA
-------- ---------- ---------- ----- ------- -------
SLBA     12:54:19   12:54:24       2   12500    9998
         12:54:13   12:54:18       1    2100    9998
         12:52:58   12:53:58       1    2100    9998
         12:52:55   12:53:54       2   12500    9998

9. Start 10 jobs using SERVICE SLBA at RAC2 - total 80 jobs

12:55:37 system@DEVDB> @ct
count # of session by username DENIS

                     Inst
SYSDATE                ID   COUNT(*)
------------------- ----- ----------
2010-04-16 12:55:55     1         53
2010-04-16 12:55:55     2         27

12:55:50 system@DEVDB> @servicemetric.sql SLBA

Service                         Inst    Good
Name     Start Time End Time      ID    ness   DELTA
-------- ---------- ---------- ----- ------- -------
SLBA     12:55:55   12:55:59       2   12500    9998
         12:55:48   12:55:52       1    2650    9998
         12:54:52   12:55:55       2   12500    9998
         12:53:58   12:54:58       1    2650    9998

9. Start 10 jobs using SERVICE SLBA at RAC2 - total 90 jobs

12:56:48 system@DEVDB> @ct
count # of session by username DENIS

                     Inst
SYSDATE                ID   COUNT(*)
------------------- ----- ----------
2010-04-16 12:56:59     1         59
2010-04-16 12:56:59     2         31

12:56:52 system@DEVDB> @servicemetric.sql SLBA

Service                         Inst    Good
Name     Start Time End Time      ID    ness   DELTA
-------- ---------- ---------- ----- ------- -------
SLBA     12:56:55   12:57:00       2   12500    9998
         12:56:48   12:56:53       1    5000    9998
         12:55:55   12:56:55       2   12500    9998
         12:54:58   12:55:57       1    5000    9998


10. Start 10 jobs using SERVICE SLBA at RAC2 - total 100 jobs

12:58:55 system@DEVDB> @ct
count # of session by username DENIS

                     Inst
SYSDATE                ID   COUNT(*)
------------------- ----- ----------
2010-04-16 12:59:08     1         65
2010-04-16 12:59:08     2         35

12:59:01 system@DEVDB> @servicemetric.sql SLBA

Service                         Inst    Good
Name     Start Time End Time      ID    ness   DELTA
-------- ---------- ---------- ----- ------- -------
SLBA     12:59:05   12:59:10       2   12500    9998
         12:58:54   12:58:57       1   12500    9998
         12:57:58   12:58:57       1   12500    9998
         12:57:52   12:58:52       2   12500    9998


Note: At total jobs 100, the GOODNESS of two instances were evened out. The change of GOODNESS inidicated the LBA was working and somehow was used to direct the incomming connections to the less-loaded instance.

Scripts used in this test can be found here.