Showing posts with label update. Show all posts
Showing posts with label update. Show all posts

Friday, February 14, 2014

Update selected rows of a large table through rowid


We have a requirement to update a date column of about 14M rows in a large table of billions of rows .  The challenge is we have limited downtime to perform such a task.
In a Sun SPARC T4-2 machine with 128 virtual CPUs, we tested two different join methods with parallel executions:

-- nested loop took about 8h30
UPDATE TTQ_DQG.DQG_PRODUCT b
    SET billing_end_date = SYSDATE
    where exists (select /*+ parallel (a 12) */ 1 from DBSCRUB.ISOC_MAPPING_5M a where b.bo_id=a.old_bo_id);

-- hash join took about 5h
UPDATE /*+ parallel (b 12) */TTQ_DQG.DQG_PRODUCT b
    SET billing_end_date = SYSDATE
    where exists (select 1 from DBSCRUB.ISOC_MAPPING_5M a where b.bo_id=a.old_bo_id)
   ;

The 5h execution time still does not meet our downtime requirement. I proposed to perform the task in two steps.

(1) create a rowid driver table outside downtime window
(2) update the table with rowid in the downtime window

The test in the same machine shows we can do it in 41 min!
SQL>
SQL> --Step 1. Create a rowid driver table  several hours before downtime
SQL>
SQL> CREATE table  dbscrub.dqg_product_5m_rowid nologging
  2              AS
  3              select /*+ full(a) parallel( a 10) */ a.rowid rid from  TTQ_DQG.dqg_product a
  4              where exists ( select 1 from dbscrub.ISOC_MAPPING_5M b where a.bo_id=b.old_bo_id);
Table created.
Elapsed: 00:22:01.71
SQL>
SQL>
SQL>
SQL>
SQL> --Step 2 update table with rowid access in the downtime window
SQL>
SQL>
SQL> UPDATE  TTQ_DQG.CPF_PRODUCT b
  2  SET billing_end_date = SYSDATE
  3  where b.rowid in  (select /*+ parallel (a, 12) */  rid from DBSCRUB.dqg_product_5m_rowid a)
  4  ;
14431117 rows updated.
Elapsed: 00:41:34.28
SQL>
SQL>
SQL> spool off

Is this approach safe, any concerns? You may check this AskTom thread:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:53140678334596

Tuesday, December 23, 2008

Update a join

I have started to work for my current client since Dec 15. This post is the first new thing I've learend from this client site.

Yesterday I was requested to check a "scrub" job that application team was running in a test environment, which would go into production at night. This "scrub" job was essentially to update a column of three tables using the value from the driver table. The developer provided the actual code he used as follows: ( The table names and colunm names were modified)


DECLARE
  counter  NUMBER(10):= 0;
  CURSOR upd_tabs IS SELECT x,  y FROM D;
BEGIN
  FOR rec IN upd_tabs LOOP
    UPDATE T1 SET y= rec.y WHERE x=rec.x;
    UPDATE T2 SET y= rec.y WHERE z=rec.x;
    UPDATE T3 SET y= rec.y WHERE z=rec.x;
    counter:=counter+1;
    IF MOD(counter,1000) = 0 )THEN
       INSERT INTO update_log (COMMIT_TIME, COMMIT_COUNTER)
             VALUES (SYSDATE,  counter);
       COMMIT;
    END IF;
    IF upd_tabs%NOTFOUND THEN
      EXIT;
    END IF;
  END LOOP;
COMMIT;
END;



Maybe several days ago , I came across a thread on AskTom forum, which talks about update the first table with second table. At that time I did not study throughly. I re-visited that thread, and I think Tom' s approach - update a join, can be used in our case. I checked the table involed on production database, found that:

T1 (x, y ... ) - x is primary key 3M

T2 (z, y ... ) - There is an index on z 14M

T3 (z, y ... ) - Primary key on z and another column, z is the leading column 9M

D (x, y ... ) - no primary key constraint, x is not null, but most likly x is unique also


If we can enfore the primary key constraint on D (x), we can write the following sqls:

update
( select
    T1.y T1_y,
    D.y D_y
  from
     T1,
     D
  where T1.x = D.x
)
set
   T1_y = D_y
/



As a rule of thumb, if problem can be solved using only SQL statment, it is always better than using PL/SQL. If we take a close check on the alogrithm of the PL/SQL code above, we can find that it is no difference than the execution plan shown in the following test case: ( This test case is just repeating what Tom Kyte did in the AskTom forum thread )

denis@DB10G>
denis@DB10G> create table t1
2  ( x int constraint t1_pk primary key,
3    y int );

Table created.

denis@DB10G>
denis@DB10G> create table t2
2  ( x int constraint t2_pk primary key,
3    y int );

Table created.

denis@DB10G>
denis@DB10G> insert into t1 values (1,1);

1 row created.

denis@DB10G> insert into t1 values (2,1);

1 row created.

denis@DB10G> insert into t1 values (3,1);

1 row created.

denis@DB10G>
denis@DB10G> insert into t2 values (2,2);

1 row created.

denis@DB10G> insert into t2 values (3,2);

1 row created.

denis@DB10G>
denis@DB10G>
denis@DB10G> -- update a join
denis@DB10G> set autotrace on explain
denis@DB10G> update
2   ( select  /*+ USE_NL(t1) INDEX (t1 t1_pk ) */
3     t1.y t1_y, t2.y t2_y
4       from t1, t2
5     where t1.x = t2.x )
6    set t1_y = t2_y
7  /

2 rows updated.


Execution Plan
----------------------------------------------------------
Plan hash value: 2040170521

---------------------------------------------------------------------------------------
Id   Operation                      Name   Rows   Bytes  Cost (%CPU) Time    
---------------------------------------------------------------------------------------
0  UPDATE STATEMENT                          2    104      5   (0) 00:00:01
1   UPDATE                        T1                                       
2    NESTED LOOPS                            2    104      5   (0) 00:00:01
3     TABLE ACCESS FULL           T2         2     52      3   (0) 00:00:01
4     TABLE ACCESS BY INDEX ROWID T1         1     26      1   (0) 00:00:01
*  5      INDEX UNIQUE SCAN          T1_PK      1             0   (0) 00:00:01
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("T1"."X"="T2"."X")

Note
-----
- dynamic sampling used for this statement

denis@DB10G>
denis@DB10G> set autotrace off;
denis@DB10G>
denis@DB10G>> select * from t1;

X          Y
---------- ----------
1          1
2          2
3          2
denis@DB10G>
denis@DB10G> -- for larger tables the hint may not be necessary, Oracle can
denis@DB10G> -- pick up the right plan
denis@DB10G>
denis@DB10G> insert into t1 select rownum + 5 , 1
2  from all_objects
3  /

49841 rows created.

denis@DB10G>
denis@DB10G> analyze table t1 compute statistics;

Table analyzed.

denis@DB10G>
denis@DB10G> analyze table t2 compute statistics;

Table analyzed.

denis@DB10G>
denis@DB10G> set autotrace on explain
denis@DB10G> update
2     ( select t1.y t1_y, t2.y t2_y
3       from t1, t2
4     where t1.x = t2.x )
5    set t1_y = t2_y
6  /

2 rows updated.


Execution Plan
----------------------------------------------------------
Plan hash value: 2040170521

---------------------------------------------------------------------------------------
Id   Operation                      Name   Rows   Bytes  Cost (%CPU) Time    
---------------------------------------------------------------------------------------
0  UPDATE STATEMENT                          2     20      5   (0) 00:00:01
1   UPDATE                        T1                                       
2    NESTED LOOPS                            2     20      5   (0) 00:00:01
3     TABLE ACCESS FULL           T2         2      8      3   (0) 00:00:01
4     TABLE ACCESS BY INDEX ROWID T1         1      6      1   (0) 00:00:01
*  5      INDEX UNIQUE SCAN          T1_PK      1             0   (0) 00:00:01
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

5 - access("T1"."X"="T2"."X")

denis@DB10G> set autotrace off
denis@DB10G>



Per Tom:
It should be NOTED that in order for the update of a join to work, T2 in the above example must have a primary or unique key on it. If T2 does not, you'll not be able to update the join. The optimizer needs to know that T1 will be joined to AT MOST 1 row in T2 and the way it does this is by seeing that a primary key or unique constraint exists on the join conditions