Tuesday, February 15, 2011

Tuning an inefficient SQL with semi-join

Found a sql look like the following in a batch job:

SELECT    
 distinct
 C.INT_PSEFS_ID
,C.ACCOUNT_ID
,C.PSEFS_ID
,C.IS_PQ
,C.psefs_save_date
FROM     
   MYSCH.PSEFS_MSTR A
  ,MYSCH.PSEFS_DTLS B
  ,MYSCH.PSEFS_MSTR C
WHERE    
    A.INT_PSEFS_ID = B.INT_PSEFS_ID
  AND C.PSEFS_ID LIKE SUBSTR (A.PSEFS_ID, 1, 13)|| '%'
  AND A.IS_VALID = 'N' 
  AND B.PSEFS_STATUS_ID IN (4, 6)
  AND A.PSEFS_COMP_DATE  <= SYSDATE - 45
;
The execution plan is as follows:
---------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |    66T|  4861T|       |  3064G  (2)|999:59:59 |
|   1 |  HASH UNIQUE                 |                  |    66T|  4861T|    11P|  3064G  (2)|999:59:59 |
|   2 |   TABLE ACCESS BY INDEX ROWID| PSEFS_MSTR       |  3287K|   122M|       |   149K  (1)| 00:29:49 |
|   3 |    NESTED LOOPS              |                  |    66T|  4861T|       |  3027G  (1)|999:59:59 |
|*  4 |     HASH JOIN                |                  |    20M|   794M|   550M|   839K  (3)| 02:47:49 |
|*  5 |      TABLE ACCESS FULL       | PSEFS_DTLS       |    26M|   250M|       |   612K  (2)| 02:02:29 |
|*  6 |      TABLE ACCESS FULL       | PSEFS_MSTR       |    20M|   600M|       |   156K  (6)| 00:31:17 |
|*  7 |     INDEX RANGE SCAN         | PSEFS_MSTR_IX5   |   591K|       |       |  5154   (1)| 00:01:02 |
---------------------------------------------------------------------------------------------------------


INT_PSEFS_ID is a PK of A, and a FK of B, so A join B on INT_PSEFS_ID will generate a result set (refer to as R1) with duplicated records of A; Then we selected out row of C (same table as A) by join it with R1t based on an unusual join condition with LIKE operator. To see the problem by way of example, supposing we have a row in C with PSRFS_ID='ABCDEFGHIJKLM_2'; In R1, we could have several rows with PSRFS_ID in('ABCDEFGHIJKLM_3', 'ABCDEFGHIJKLM_4', 'ABCDEFGHIJKLM_4', 'ABCDEFGHIJKLM'), due to the LIKE, the row in C will match 4 rows in R1, thus generate 4 duplicated rows in the final result set. I rewrote the SQL to use sub-queries to enable the more efficient semi-joins intead of normal joins
select 
C.INT_PSEFS_ID
, C.ACCOUNT_ID
, C.PSEFS_ID,IS_PQ
, C.psefs_save_date
from MYSCH.PSEFS_MSTR C
where
   SUBSTR(C.PSEFS_ID,1,13) in
   (
   select  SUBSTR(A.PSEFS_ID,1,13)
   from MYSCH.PSEFS_MSTR  A
    where  A.IS_VALID = 'N'
     and A.PSEFS_COMP_DATE  <= SYSDATE - 45 
     and exists ( select 1 from MYSCH.PSEFS_DTLS B
     where A.int_psefs_id = B.int_psefs_id
       and B.PSEFS_STATUS_ID IN (4, 6) )
   ) ;

Now the execution plan looks like:
------------------------------------------------------------------------------------------------
| Id  | Operation              | Name          | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |               |    26M|  1182M|       |  1174K  (3)| 03:54:49 |
|*  1 |  HASH JOIN RIGHT SEMI  |               |    26M|  1182M|   387M|  1174K  (3)| 03:54:49 |
|   2 |   VIEW                 | VW_NSO_1      |    20M|   155M|       |   839K  (3)| 02:47:49 |
|*  3 |    HASH JOIN RIGHT SEMI|               |    20M|   794M|   550M|   839K  (3)| 02:47:49 |
|*  4 |     TABLE ACCESS FULL  | PSEFS_DTLS    |    26M|   250M|       |   612K  (2)| 02:02:29 |
|*  5 |     TABLE ACCESS FULL  | PSEFS_MSTR    |    20M|   600M|       |   156K  (6)| 00:31:17 |
|   6 |   TABLE ACCESS FULL    | PSEFS_MSTR    |    65M|  2445M|       |   153K  (4)| 00:30:45 |
------------------------------------------------------------------------------------------------


Fig 1. Good query visual representation
           



Fig 2. Bad query visual representaion


Note: though this sql is actually rather simple, the visual approach advocated by Jonathan Lewis in this article really helps me to do the analysis

No comments: