Tuesday, December 23, 2008

Update a join

I have started to work for my current client since Dec 15. This post is the first new thing I've learend from this client site.

Yesterday I was requested to check a "scrub" job that application team was running in a test environment, which would go into production at night. This "scrub" job was essentially to update a column of three tables using the value from the driver table. The developer provided the actual code he used as follows: ( The table names and colunm names were modified)


DECLARE
  counter  NUMBER(10):= 0;
  CURSOR upd_tabs IS SELECT x,  y FROM D;
BEGIN
  FOR rec IN upd_tabs LOOP
    UPDATE T1 SET y= rec.y WHERE x=rec.x;
    UPDATE T2 SET y= rec.y WHERE z=rec.x;
    UPDATE T3 SET y= rec.y WHERE z=rec.x;
    counter:=counter+1;
    IF MOD(counter,1000) = 0 )THEN
       INSERT INTO update_log (COMMIT_TIME, COMMIT_COUNTER)
             VALUES (SYSDATE,  counter);
       COMMIT;
    END IF;
    IF upd_tabs%NOTFOUND THEN
      EXIT;
    END IF;
  END LOOP;
COMMIT;
END;



Maybe several days ago , I came across a thread on AskTom forum, which talks about update the first table with second table. At that time I did not study throughly. I re-visited that thread, and I think Tom' s approach - update a join, can be used in our case. I checked the table involed on production database, found that:

T1 (x, y ... ) - x is primary key 3M

T2 (z, y ... ) - There is an index on z 14M

T3 (z, y ... ) - Primary key on z and another column, z is the leading column 9M

D (x, y ... ) - no primary key constraint, x is not null, but most likly x is unique also


If we can enfore the primary key constraint on D (x), we can write the following sqls:

update
( select
    T1.y T1_y,
    D.y D_y
  from
     T1,
     D
  where T1.x = D.x
)
set
   T1_y = D_y
/



As a rule of thumb, if problem can be solved using only SQL statment, it is always better than using PL/SQL. If we take a close check on the alogrithm of the PL/SQL code above, we can find that it is no difference than the execution plan shown in the following test case: ( This test case is just repeating what Tom Kyte did in the AskTom forum thread )

denis@DB10G>
denis@DB10G> create table t1
2  ( x int constraint t1_pk primary key,
3    y int );

Table created.

denis@DB10G>
denis@DB10G> create table t2
2  ( x int constraint t2_pk primary key,
3    y int );

Table created.

denis@DB10G>
denis@DB10G> insert into t1 values (1,1);

1 row created.

denis@DB10G> insert into t1 values (2,1);

1 row created.

denis@DB10G> insert into t1 values (3,1);

1 row created.

denis@DB10G>
denis@DB10G> insert into t2 values (2,2);

1 row created.

denis@DB10G> insert into t2 values (3,2);

1 row created.

denis@DB10G>
denis@DB10G>
denis@DB10G> -- update a join
denis@DB10G> set autotrace on explain
denis@DB10G> update
2   ( select  /*+ USE_NL(t1) INDEX (t1 t1_pk ) */
3     t1.y t1_y, t2.y t2_y
4       from t1, t2
5     where t1.x = t2.x )
6    set t1_y = t2_y
7  /

2 rows updated.


Execution Plan
----------------------------------------------------------
Plan hash value: 2040170521

---------------------------------------------------------------------------------------
Id   Operation                      Name   Rows   Bytes  Cost (%CPU) Time    
---------------------------------------------------------------------------------------
0  UPDATE STATEMENT                          2    104      5   (0) 00:00:01
1   UPDATE                        T1                                       
2    NESTED LOOPS                            2    104      5   (0) 00:00:01
3     TABLE ACCESS FULL           T2         2     52      3   (0) 00:00:01
4     TABLE ACCESS BY INDEX ROWID T1         1     26      1   (0) 00:00:01
*  5      INDEX UNIQUE SCAN          T1_PK      1             0   (0) 00:00:01
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("T1"."X"="T2"."X")

Note
-----
- dynamic sampling used for this statement

denis@DB10G>
denis@DB10G> set autotrace off;
denis@DB10G>
denis@DB10G>> select * from t1;

X          Y
---------- ----------
1          1
2          2
3          2
denis@DB10G>
denis@DB10G> -- for larger tables the hint may not be necessary, Oracle can
denis@DB10G> -- pick up the right plan
denis@DB10G>
denis@DB10G> insert into t1 select rownum + 5 , 1
2  from all_objects
3  /

49841 rows created.

denis@DB10G>
denis@DB10G> analyze table t1 compute statistics;

Table analyzed.

denis@DB10G>
denis@DB10G> analyze table t2 compute statistics;

Table analyzed.

denis@DB10G>
denis@DB10G> set autotrace on explain
denis@DB10G> update
2     ( select t1.y t1_y, t2.y t2_y
3       from t1, t2
4     where t1.x = t2.x )
5    set t1_y = t2_y
6  /

2 rows updated.


Execution Plan
----------------------------------------------------------
Plan hash value: 2040170521

---------------------------------------------------------------------------------------
Id   Operation                      Name   Rows   Bytes  Cost (%CPU) Time    
---------------------------------------------------------------------------------------
0  UPDATE STATEMENT                          2     20      5   (0) 00:00:01
1   UPDATE                        T1                                       
2    NESTED LOOPS                            2     20      5   (0) 00:00:01
3     TABLE ACCESS FULL           T2         2      8      3   (0) 00:00:01
4     TABLE ACCESS BY INDEX ROWID T1         1      6      1   (0) 00:00:01
*  5      INDEX UNIQUE SCAN          T1_PK      1             0   (0) 00:00:01
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("T1"."X"="T2"."X")

denis@DB10G> set autotrace off
denis@DB10G>



Per Tom:
It should be NOTED that in order for the update of a join to work, T2 in the above example must have a primary or unique key on it. If T2 does not, you'll not be able to update the join. The optimizer needs to know that T1 will be joined to AT MOST 1 row in T2 and the way it does this is by seeing that a primary key or unique constraint exists on the join conditions

No comments: