Oracle Study Notes by Denis: An experience of tunning a SQL with EXISTS and NOT EXISTS subqueries

Our team was requested to run a SQL on Saturday aternoon. The time window was from 4PM to midnight. Since from midnight a maintenance job was scheduled and the database won't be available, we promised to deliver the results by midnight. However, the DBA responsible for carring out the task had no idea how long this SQl would take. So he would do a dry run on Friday afternoon. I was called to help him.

The SQL has the following structure (actual table names, column name were modified of course):


SELECT *
  FROM a,
       bi,
       bm
 WHERE bm.bsc = 3
   AND bm.bsci = 'S'
   AND a.bid = bi.bid
   AND a.bid = bm.vp_bid
   AND
   NOT EXISTS (
          SELECT  1
            FROM bm bm1
           WHERE bm.vp_bid = bm1.vp_bid
             AND bm1.bsci = 'S'
             AND bm1.bsc NOT IN (3, 4));
union
SELECT  *
  FROM a,
       bi,
       bm
 WHERE bm.bsc = 3
   AND bm.bsci = 'S'
   AND a.bid = bi.bid
   AND a.bid = bm.vp_bid
   AND
      EXISTS (
          SELECT  1
            FROM bm bm1
           WHERE bm.vp_bid = bm1.vp_bid
             AND bm1.bsci = 'S'
             AND bm1.bsc NOT IN (3, 4));

The DBA initially tried to do a dry run using hint parallel 5 for all the three tables. He said using parallel hint he got smaller cost value from the explain plan. I told him if he used parallel 10, he would get even smaller cost value, but this did not mean the SQL would run faster. I told him just start the SQL without any hint and see how long it would take.

The number of rows of the table involved are as follows:



 TABLE_NAME          NUM_ROWS LAST_ANALYZED
 ----------------- ---------- -------------------
 BM                 309688610 2009-01-03 07:22:04
 A                   15837920 2009-01-03 10:54:31
 BI                  15840300 2009-01-03 11:08:33

Our databae is Oracle 9.2.0.8.

It turned out the SQL did not finish after 5.5 hours running. The execution plan looked like:

 
------------------------------------------------------------------------
|Id | Operation                      |  Name | Rows| Bytes |Temp |
                                                             Spc |Cost |
------------------------------------------------------------------------
| 0 | SELECT STATEMENT               |       |     |       |     |  11M|
| 1 |  SORT UNIQUE                   |       | 110M|  9017M|  19G|  11M|
| 2 |   UNION-ALL                    |       |     |       |     |     |
| 3 |    FILTER                      |       |     |       |     |     |
| 4 |     HASH JOIN                  |       |1124K|    79M|  61M| 334K|
| 5 |      HASH JOIN                 |       |1124K|    48M|  38M| 305K|
| 6 |       TABLE ACCESS FULL        | BM    |1124K|    25M|     | 271K|
| 7 |       TABLE ACCESS FULL        | A     |  15M|   319M|     |24889|
| 8 |      TABLE ACCESS FULL         | BI    |  15M|   440M|     |17719|
| 9 |     TABLE ACCESS BY INDEX ROWID| BM    |   5 |    60 |     |   8 |
|10 |      INDEX RANGE SCAN          | BM_FK5|  13 |       |     |   4 |
|11 |    HASH JOIN                   |       | 108M|  8937M|1844M| 703K|
|12 |     HASH JOIN                  |       |  22M|  1587M| 772M| 359K|
|13 |      TABLE ACCESS FULL         | BM    |  22M|   514M|     |  71K|
|14 |      HASH JOIN                 |       |  15M|   759M| 501M|60525|
|15 |       TABLE ACCESS FULL        | A     |  15M|   319M|     |24889|
|16 |       TABLE ACCESS FULL        | BI    |  15M|   440M|     |17719|
|17 |     TABLE ACCESS FULL          | BM    | 115M|  1323M|     | 271K|
------------------------------------------------------------------------

We were worried about this SQL, our manager set up a call with developers at about 7PM. I learned from them that we can use inion all instead of union since there are no possible dulpicate rows from the two parts. Finally we agreed with runing the two parts of the SQL separatly and load the results to two separate temp table. And DBA team only need diliver these two temp talbe to the devlopers.

I took over the task. I started a new test and using the nested loop hint for the first part:

 

SELECT /*+ use_nl(bm, a) use_nl(a, bi) */
   *
  FROM a,
       bi,
       bm
 WHERE bm.bsc = 3
   AND bm.bsci = 'S'
   AND a.bid = bi.bid
   AND a.bid = bm.vp_bid
   AND
   NOT EXISTS (
          SELECT  1
            FROM bm bm1
           WHERE bm.vp_bid = bm1.vp_bid
             AND bm1.bsci = 'S'
             AND bm1.bsc NOT IN (3, 4));

The execution plan for this part is as follows:


------------------------------------------------------------------------
| Id | Operation                       |  Name | Rows  | Bytes | Cost  |
------------------------------------------------------------------------
|  0 | SELECT STATEMENT                |       |       |       |    14M|
|  1 |  FILTER                         |       |       |       |       |
|  2 |   TABLE ACCESS BY INDEX ROWID   | BI    |     1 |    29 |     3 |
|  3 |    NESTED LOOPS                 |       |  1106K|    78M|  5797K|
|  4 |     NESTED LOOPS                |       |  1106K|    47M|  2479K|
|  5 |      TABLE ACCESS FULL          | BM    |  1106K|    25M|   267K|
|  6 |      TABLE ACCESS BY INDEX ROWID| A     |     1 |    21 |     2 |
|  7 |       INDEX UNIQUE SCAN         | A_PK  |     1 |       |     1 |
|  8 |     INDEX RANGE SCAN            | BI_PK |     1 |       |     2 |
|  9 |   TABLE ACCESS BY INDEX ROWID   | BM    |     5 |    60 |     8 |
| 10 |    INDEX RANGE SCAN             | BM_FK5|    13 |       |     4 |
------------------------------------------------------------------------

I used parallel hint for the second part:

 
SELECT /*+ parallel (bm, 4) */  
  *
  FROM a,
       bi,
       bm
 WHERE bm.bsc = 3
   AND bm.bsci = 'S'
   AND a.bid = bi.bid
   AND a.bid = bm.vp_bid
   AND
      EXISTS (
          SELECT  1
            FROM bm bm1
           WHERE bm.vp_bid = bm1.vp_bid
             AND bm1.bsci = 'S'
             AND bm1.bsc NOT IN (3, 4));

And the execution plan looks like:


----------------------------------------------------------------------------
Id | Operation            | Name|Rows|Bytes|Temp|Cost | TQ  |IN-OUT|PQ 
                                           |Spc |     |     |      |Distrib 
----------------------------------------------------------------------------
 0 | SELECT STATEMENT     |     |    |     |    | 404K|     |      |        
 1 |  HASH JOIN           |     | 22M|1814M|154M| 404K|98,00| P->S |QC(RAND)
 2 |   TABLE ACCESS FULL  | BI  | 15M| 438M|    |17565|98,00| S->P | HASH   
 3 |   HASH JOIN          |     | 22M|1202M|124M| 378K|98,00| PCWP |        
 4 |    TABLE ACCESS FULL | A   | 15M| 317M|    |24624|98,00| S->P | HASH   
 5 |    HASH JOIN SEMI    |     | 22M| 759M|189M| 348K|98,00| PCWP |        
 6 |     TABLE ACCESS FULL| BM  | 22M| 506M|    |66931|98,00| P->P | HASH   
 7 |     TABLE ACCESS FULL| BM  |113M|1301M|    | 267K|98,00| S->P | HASH   
----------------------------------------------------------------------------

Good news was the first part finished in 3 hours, and the second part took 2 h 40 min. On Saturday afternoon, I did the real run and it took a little longer but finished in 4 hours.

At this point, I am not sure I chose the best way to accomplish this task. I just ensured that we deliverd as we promised. I don't know if it could be better if we just adopt the approach of using parallel hint 5 for all the three tables. It was too late for DBA to having sufficient time to find the best way to execute this SQL.

I will re-visit this SQL if I have time. Acutally I already invest some time to understand anti-join this evening - the first part of this SQL is acutally anti-join, but I have not seen Oracle choose anti-join access path in the current execution plan.

Oracle Study Notes by Denis

I am starting to study SQL Server as well

Sunday, January 18, 2009

An experience of tunning a SQL with EXISTS and NOT EXISTS subqueries

No comments:

Like this Facebook page:

About Me

Twitter

Recent Comments

Blog Archive

Label Cloud

My Blog List

Oracle Links

Random Post