Monday, July 09, 2012

Using SQL Plan Baseline to ensure a 10g execution plan after upgrade to 11g

Recently I have got a chance to use the SQL Plan Management feature - SQL Plan Baseline to fix a regressed sql after upgrade to 11g from 10g. The problem sql looks like:
SELECT 
       fc.gjmfmpbe_type, 
       fc.gjmfmpbe_config_seq, 
       fi.gjmfmpbe_info_seq, 
       fd.gjmfmpbe_detail_seq, 
       fi.physical_gjmf_date, 
       fi.gjmf_name, 
       fi.mpbe_timestamp, 
       fd.gjmf_detail, 
       fc.mog_prefix, 
       fd.gjmf_detail_enc, 
       fds.retry_count, 
       fc.retry_config 
FROM   
       gjmfmpbe_config fc, 
       gjmfmpbe_detail_status fds,
       gjmfmpbe_detail fd, 
       gjmfmpbe_info fi 
WHERE  ( fi.gjmfmpbe_info_seq = fd.gjmfmpbe_info_seq ) 
       AND ( fds.gjmfmpbe_detail_seq = fd.gjmfmpbe_detail_seq ) 
       AND ( fi.gjmfmpbe_config_seq = fc.gjmfmpbe_config_seq ) 
       AND ( fds.status IN ( 'LD', 'RT' ) ) 
       AND ( fc.gjmf_directory = 'BillCycle' ) 
       AND ( fds.thread_id = '1' ) 
ORDER  BY fd.gjmfmpbe_detail_seq;  

The execution plans of this sql in 10g and 11g are shown below, respectively:
10 g execution plan
----------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                            |       |       |  1477 (100)|          |
|   1 |  SORT ORDER BY                    |                            |    13 |  4641 |  1477   (1)| 00:00:18 |
|   2 |   NESTED LOOPS                    |                            |    13 |  4641 |  1476   (1)| 00:00:18 |
|   3 |    NESTED LOOPS                   |                            |   486 |   116K|   990   (1)| 00:00:12 |
|   4 |     MERGE JOIN CARTESIAN          |                            |   486 | 25758 |    17   (0)| 00:00:01 |
|   5 |      TABLE ACCESS BY INDEX ROWID  | GJMFMPBE_CONFIG            |     1 |    34 |     2   (0)| 00:00:01 |
|   6 |       INDEX RANGE SCAN            | GJMFMPBE_CONFIG_FILEDIR    |     1 |       |     1   (0)| 00:00:01 |
|   7 |      BUFFER SORT                  |                            |   486 |  9234 |    15   (0)| 00:00:01 |
|   8 |       INLIST ITERATOR             |                            |       |       |            |          |
|   9 |        TABLE ACCESS BY INDEX ROWID| GJMFMPBE_DETAIL_STATUS     |   486 |  9234 |    15   (0)| 00:00:01 |
|  10 |         INDEX RANGE SCAN          | GJMFMPBE_DETAIL_STATUS_IX1 |   486 |       |     6   (0)| 00:00:01 |
|  11 |     TABLE ACCESS BY INDEX ROWID   | GJMFMPBE_DETAIL            |     1 |   193 |     2   (0)| 00:00:01 |
|  12 |      INDEX UNIQUE SCAN            | PK_GJMFMPBE_DETAIL         |     1 |       |     1   (0)| 00:00:01 |
|  13 |    TABLE ACCESS BY INDEX ROWID    | GJMFMPBE_INFO              |     1 |   111 |     1   (0)| 00:00:01 |
|  14 |     INDEX UNIQUE SCAN             | PK_GJMFMPBE_INFO           |     1 |       |     0   (0)|          |
----------------------------------------------------------------------------------------------------------------


11 g execution plan 

---------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                            |       |       | 11130 (100)|       |
|   1 |  SORT ORDER BY                   |                            |   108 | 37692 | 11130   (1)| 00:02:14 |
|*  2 |   HASH JOIN                      |                            |   108 | 37692 | 11129   (1)| 00:02:14 |
|   3 |    TABLE ACCESS BY INDEX ROWID   | GJMFMPBE_CONFIG            |     1 |    25 |     2   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN             | GJMFMPBE_CONFIG_FILEDIR    |     1 |       |     1   (0)| 00:00:01 |
|*  5 |    HASH JOIN                     |                            |  4333 |  1370K| 11127   (1)| 00:02:14 |
|   6 |     NESTED LOOPS                 |                            |       |       |            |       |
|   7 |      NESTED LOOPS                |                            |  4333 |   901K|  8733   (1)| 00:01:45 |
|   8 |       TABLE ACCESS BY INDEX ROWID| GJMFMPBE_DETAIL_STATUS     |  4333 | 77994 |    59   (0)| 00:00:01 |
|*  9 |        INDEX RANGE SCAN          | GJMFMPBE_DETAIL_STATUS_IX1 |  2183 |       |    18   (0)| 00:00:01 |
|* 10 |       INDEX UNIQUE SCAN          | PK_GJMFMPBE_DETAIL         |     1 |       |     1   (0)| 00:00:01 |
|  11 |      TABLE ACCESS BY INDEX ROWID | GJMFMPBE_DETAIL            |     1 |   195 |     2   (0)| 00:00:01 |
|  12 |     TABLE ACCESS FULL            | GJMFMPBE_INFO              |   466K|    49M|  2388   (1)| 00:00:29 |
---------------------------------------------------------------------------------------------------------------
The difference lies in the join order. In 10g join order: fc -> fds -> fd -> fi, and in 11g join order: fc -> (fds -> fd -> fi). In 11g the buffer gets per execution is around 427K as shown with below query:
select sql_id, child_number, buffer_gets/executions, executions, sql_plan_baseline from v$sql where sql_id='cz1myj2gx5xwv';

SQL_ID        CHILD_NUMBER BUFFER_GETS/EXECUTIONS EXECUTIONS SQL_PLAN_BASELINE
------------- ------------ ---------------------- ---------- ------------------------------
cz1myj2gx5xwv            0             427447.751        233

Before upgrade, we backed up 10g execution plan in a sql tuning set. so I was able to obtain the buffer gets per execution in 10g by:
SELECT sql_id, plan_hash_value, executions, buffer_gets/executions from
   table(dbms_sqltune.select_sqlset(
       'SPM_STS'
      ,'sql_id=''cz1myj2gx5xwv'''
      , SQLSET_OWNER=>'OPS$ORACLE'
 )
   );  
  


SQL_ID        PLAN_HASH_VALUE EXECUTIONS BUFFER_GETS/EXECUTIONS
------------- --------------- ---------- ----------------------
cz1myj2gx5xwv      1052698328       2742             31115.0627
Now come to the fix. I executed the following procedure to load the 10 plan stored in a SQL tuning set (STS) into SQL plan baselines for this problem sql
DECLARE
  l_plans_loaded  PLS_INTEGER;
BEGIN
  l_plans_loaded := DBMS_SPM.load_plans_from_sqlset(
    SQLSET_NAME => 'SPM_STS'
   ,SQLSET_OWNER => 'OPS$ORACLE'
   ,BASIC_FILTER =>  'sql_id=''cz1myj2gx5xwv'''
 );
END;
/

To verify the problem is fixed, I checked the execution stats shortly after:
select sql_id, child_number, buffer_gets/executions,round(elapsed_time/1000000/executions) ela_secs_exe, executions, sql_plan_baseline from v$sql where sql_id='cz1myj2gx5xwv';

SQL_ID        CHILD_NUMBER BUFFER_GETS/EXECUTIONS ELA_SECS_EXE EXECUTIONS SQL_PLAN_BASELINE
------------- ------------ ---------------------- ------------ ---------- ------------------------------
cz1myj2gx5xwv            0              433577.21         2414        233
cz1myj2gx5xwv            2                  11936            4         14 SQL_PLAN_4090fd80m1dzb80df0a95

WOW! 2414 seconds vs 4 seconds ??? So why the 11g chose the bad execution plan in the first place? I will investigate further if possible.


 ----------------- update Jul. 10, 2012 -----------------

The difference between 11g and 10g plan is not about join order. It is that in 11g Oracle CBO accepted "inlist as filter" access path. I did 10053 trace for the following sql:
select * from  ttq_bbbb.gjmfmpbc_detail_status fds where fds.thread_id= '1' and fds.status in ('LD', 'RT');
The execution plan looks like:
----------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                       | Rows  | Bytes | Cost (%CPU)| Time  |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                            |  4332 |   135K|    59   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| GJMFMPBC_DETAIL_STATUS     |  4332 |   135K|    59   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | GJMFMPBC_DETAIL_STATUS_IX1 |  2167 |       |    18   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("FDS"."THREAD_ID"='1')
       filter("FDS"."STATUS"='LD' OR "FDS"."STATUS"='RT')
Note: the index GJMFMPBC_DETAIL_STATUS_IX1 is on GJMFMPBC_DETAIL_STATUS ( thread_id, status). In the trace, it can be seen:
Access Path: index (RangeScan)
    Index: GJMFMPBC_DETAIL_STATUS_IX1
    resc_io: 115.00  resc_cpu: 2790630
    ix_sel: 0.000043  ix_sel_with_filters: 0.000043
    Cost: 115.30  Resp: 115.30  Degree: 1
  Considering index with inlist as filter
 
  ColGroup Usage:: PredCnt: 2  Matches Full: #1  Partial:  Sel: 0.0000
  Access Path: index (RangeScan)
    Index: GJMFMPBC_DETAIL_STATUS_IX1
    resc_io: 59.00  resc_cpu: 2439947
    ix_sel: 0.000021  ix_sel_with_filters: 0.000021
 ***** Logdef predicate Adjustment ******
 Final IO cst 0.00 , CPU cst 0.00
 ***** End Logdef Adjustment ******
    Cost: 59.26  Resp: 59.26  Degree: 1
  Accepted inlist as filter
  ****** trying bitmap/domain indexes ******
  ****** finished trying bitmap/domain indexes ******
  Best:: AccessPath: IndexRange
  Index: GJMFMPBC_DETAIL_STATUS_IX1
         Cost: 59.26  Degree: 1  Resp: 59.26  Card: 4331.78  Bytes: 0

No comments: