Friday, April 16, 2010

Play with my toy 10g RAC III - Demonstrate Load Balancing Advisory

By design, RAC involves multiple instances accessing the same database. By analogy, a database can be thought of as an island, which is accessible by multiple brigdes (i.e. instances). If there were two brigdes and 10 incoming cars that are being driven to the island, and if the 10 cars are directed randomly toward one of the bridges,  each bridge will be added 5 cars roughly, this is so-called Client-side connect-time load balancing . That is to say regardless of how busy a brigde is, how many cars already on that brigde, and how much longer to travel through that bridge etc, each brige is treated equally. We can easily imagine that if we have a mechanizm to monitor the condition of each brigde and direct the incoming cars based on the current traffic conditions, we will be much better in providing the service. Such a mechanizm in Oracle RAC is called Load balancing Advisory, which can be used in the Server-side connect-time load balancing.  Both client-side and server-side load balancings are applied to the time when a physical connection to a database is initially opened. Since opening a connection is an expensive operation, ideally application should be able to open a connection once, then reuse it for as many times as needed. There is a function called connection pooling at the application server layer, which establishs physical connections once and never close them. With respect to connection pools, there is a feature called "Runtime Connection Load Balancing", esstially it allocates logical connections among the physical connections in the connection pool at the transaction level instead of load balancing at the time of the initial database connection.

In this post, I will describe a test which is to demonstrate server-side connect-time load balancing with load balancing advisory feature. The idea used in this test was inspired from Jim Czuprynski's article , as well as in his Oracle University RAC class, which I attended about two and half years ago.

First of all, a "job" in this test is defined as a workload that involves:

(1) connect to the database as user "DENIS"
(2) do an insert and  an delete to a table 1000 times
(3) disconnect
(4) repeat (1) to (3) during predefined time period, e.g 20 min.

So if I submit 100 such jobs, I would expect 100 sessions count by username "DENIS"

Secondly, I created a service called 'SLBA', which is used for the job to connect to the database. The details about how to create a LBA-enabled service can be seen in Jim's article or Oracle documentation.

The steps are as follows:

1. Query GV$SERVICEMEATIC for instance goodness before starting any jobs


12:39:59 system@DEVDB> @servicemetric.sql SLBA

Service                         Inst    Good
Name     Start Time End Time      ID    ness   DELTA
-------- ---------- ---------- ----- ------- -------
SLBA     12:40:00   12:40:05       2     100     100
         12:39:58   12:40:01       1     100     100
         12:38:58   12:39:58       1     100     100
         12:38:53   12:39:53       2     100     100


Note: In Oracle 10.2 and above, the GOODNESS indicates how attractive the instance is with respect to processing the workload presented to the database service. Lower numbers are preferred. The umber is internally calculated based on any goal (long or short) that was specified for that database service. In Oracle 10.2 and above, the DELTA column predicts how much GOODNESS will increase for every
additional session that connects to the instance.


2. Start 10 jobs using SERVICE SLBA at RAC2
[oracle@rac2 ~]$ startSLBAload.sh 20


12:41:18 system@DEVDB> @ct
count # of session by username DENIS

       Inst
SYSDATE                ID   COUNT(*)
------------------- ----- ----------
2010-04-16 12:41:27     1          3
2010-04-16 12:41:27     2          7


12:42:30 system@DEVDB> @servicemetric.sql SLBA

Service                         Inst    Good
Name     Start Time End Time      ID    ness   DELTA
-------- ---------- ---------- ----- ------- -------
SLBA     12:42:39   12:42:45       2     416    9998
         12:42:33   12:42:38       1     384    9844
         12:40:58   12:41:58       1     384    9844
         12:40:53   12:41:54       2     416    9998


3. Start CPU LOAD at RAC2

[oracle@rac2 ~]$ cpu_load.sh &
[1] 8882
[oracle@rac2 ~]$ ps -ef | grep zip
oracle    8883  8882 16 12:43 pts/1    00:00:01 gzip video1.mpg
oracle    8884  8882  8 12:43 pts/1    00:00:00 gzip video2.mpg
Note: I created two gzip/guzip jobs at RAC2 to increase CPU utilization at RAC2.


4. Query GV$SERVICEMEATIC for instance goodness

12:44:46 system@DEVDB> @servicemetric.sql SLBA

Service                         Inst    Good
Name     Start Time End Time      ID    ness   DELTA
-------- ---------- ---------- ----- ------- -------
SLBA     12:44:48   12:44:54       2    1666    9998
         12:44:43   12:44:49       1     500    9994
         12:43:54   12:44:54       2    1666    9998
         12:42:58   12:43:58       1     500    9994


Note: it is expected to see "Goodness" increasing in RAC2, compared to RAC1, RAC2 should be less attractive to the incomming connections.


5. Start 10 jobs using SERVICE SLBA at RAC2 - total 20 jobs

12:45:55 system@DEVDB> @ct
count # of session by username DENIS

                     Inst
SYSDATE                ID   COUNT(*)
------------------- ----- ----------
2010-04-16 12:46:11     1          7
2010-04-16 12:46:11     2         13

12:46:03 system@DEVDB> @servicemetric.sql SLBA

Service                         Inst    Good
Name     Start Time End Time      ID    ness   DELTA
-------- ---------- ---------- ----- ------- -------
SLBA     12:46:10   12:46:14       2   12500    9998
         12:46:03   12:46:07       1     588    9998
         12:44:57   12:45:58       1     588    9998
         12:44:54   12:45:54       2   12500    9998


6. Start 10 jobs using SERVICE SLBA at RAC2 - total 30 jobs

12:47:52 system@DEVDB> @ct
count # of session by username DENIS

                     Inst
SYSDATE                ID   COUNT(*)
------------------- ----- ----------
2010-04-16 12:48:06     1         15
2010-04-16 12:48:06     2         15

12:47:58 system@DEVDB> @servicemetric.sql SLBA

Service                         Inst    Good
Name     Start Time End Time      ID    ness   DELTA
-------- ---------- ---------- ----- ------- -------
SLBA     12:48:05   12:48:10       2   12500    9998
         12:47:43   12:47:48       1     750    9998
         12:46:54   12:47:55       2   12500    9998
         12:45:58   12:46:58       1     750    9998


7. Start 10 jobs using SERVICE SLBA at RAC2 - total 40 jobs

12:50:14 system@DEVDB> @servicemetric.sql SLBA

Service                         Inst    Good
Name     Start Time End Time      ID    ness   DELTA
-------- ---------- ---------- ----- ------- -------
SLBA     12:50:19   12:50:22       1    1428    9998
         12:50:19   12:50:23       2   12500    9998
         12:48:58   12:49:58       1    1428    9998
         12:48:55   12:49:54       2   12500    9998

12:50:21 system@DEVDB> @ct
count # of session by username DENIS

                     Inst
SYSDATE                ID   COUNT(*)
------------------- ----- ----------
2010-04-16 12:50:39     1         15
2010-04-16 12:50:39     2         25


12:51:23 system@DEVDB> @ct
count # of session by username DENIS


8. Start 10 jobs using SERVICE SLBA at RAC2 - total 50 jobs

Inst
SYSDATE                ID   COUNT(*)
------------------- ----- ----------
2010-04-16 12:51:37     1         26
2010-04-16 12:51:37     2         24

12:51:30 system@DEVDB> @servicemetric.sql SLBA

Service                         Inst    Good
Name     Start Time End Time      ID    ness   DELTA
-------- ---------- ---------- ----- ------- -------
SLBA     12:51:40   12:51:43       2   12500    9998
         12:51:28   12:51:33       1    2500    9998
         12:49:58   12:50:58       1    2500    9998
         12:49:54   12:50:55       2   12500    9998

Note: We started to see # of session in RAC1 greater than that in RAC2


8. Start 10 jobs using SERVICE SLBA at RAC2 - total 60 jobs

Inst
SYSDATE                ID   COUNT(*)
------------------- ----- ----------
2010-04-16 12:53:15     1         34
2010-04-16 12:53:15     2         26

12:53:09 system@DEVDB> @servicemetric.sql SLBA

Service                         Inst    Good
Name     Start Time End Time      ID    ness   DELTA
-------- ---------- ---------- ----- ------- -------
SLBA     12:53:15   12:53:20       2   12500    9998
         12:53:08   12:53:13       1    1700    9998
         12:51:58   12:52:58       1    1700    9998
         12:51:55   12:52:55       2   12500    9998


9. Start 10 jobs using SERVICE SLBA at RAC2 - total 70 jobs

count # of session by username DENIS

                     Inst
SYSDATE                ID   COUNT(*)
------------------- ----- ----------
2010-04-16 12:54:23     1         42
2010-04-16 12:54:23     2         28

12:54:15 system@DEVDB> @servicemetric.sql SLBA

Service                         Inst    Good
Name     Start Time End Time      ID    ness   DELTA
-------- ---------- ---------- ----- ------- -------
SLBA     12:54:19   12:54:24       2   12500    9998
         12:54:13   12:54:18       1    2100    9998
         12:52:58   12:53:58       1    2100    9998
         12:52:55   12:53:54       2   12500    9998

9. Start 10 jobs using SERVICE SLBA at RAC2 - total 80 jobs

12:55:37 system@DEVDB> @ct
count # of session by username DENIS

                     Inst
SYSDATE                ID   COUNT(*)
------------------- ----- ----------
2010-04-16 12:55:55     1         53
2010-04-16 12:55:55     2         27

12:55:50 system@DEVDB> @servicemetric.sql SLBA

Service                         Inst    Good
Name     Start Time End Time      ID    ness   DELTA
-------- ---------- ---------- ----- ------- -------
SLBA     12:55:55   12:55:59       2   12500    9998
         12:55:48   12:55:52       1    2650    9998
         12:54:52   12:55:55       2   12500    9998
         12:53:58   12:54:58       1    2650    9998

9. Start 10 jobs using SERVICE SLBA at RAC2 - total 90 jobs

12:56:48 system@DEVDB> @ct
count # of session by username DENIS

                     Inst
SYSDATE                ID   COUNT(*)
------------------- ----- ----------
2010-04-16 12:56:59     1         59
2010-04-16 12:56:59     2         31

12:56:52 system@DEVDB> @servicemetric.sql SLBA

Service                         Inst    Good
Name     Start Time End Time      ID    ness   DELTA
-------- ---------- ---------- ----- ------- -------
SLBA     12:56:55   12:57:00       2   12500    9998
         12:56:48   12:56:53       1    5000    9998
         12:55:55   12:56:55       2   12500    9998
         12:54:58   12:55:57       1    5000    9998


10. Start 10 jobs using SERVICE SLBA at RAC2 - total 100 jobs

12:58:55 system@DEVDB> @ct
count # of session by username DENIS

                     Inst
SYSDATE                ID   COUNT(*)
------------------- ----- ----------
2010-04-16 12:59:08     1         65
2010-04-16 12:59:08     2         35

12:59:01 system@DEVDB> @servicemetric.sql SLBA

Service                         Inst    Good
Name     Start Time End Time      ID    ness   DELTA
-------- ---------- ---------- ----- ------- -------
SLBA     12:59:05   12:59:10       2   12500    9998
         12:58:54   12:58:57       1   12500    9998
         12:57:58   12:58:57       1   12500    9998
         12:57:52   12:58:52       2   12500    9998


Note: At total jobs 100, the GOODNESS of two instances were evened out. The change of GOODNESS inidicated the LBA was working and somehow was used to direct the incomming connections to the less-loaded instance.

Scripts used in this test can be found here.

No comments: