Friday, February 14, 2014

Update selected rows of a large table through rowid


We have a requirement to update a date column of about 14M rows in a large table of billions of rows .  The challenge is we have limited downtime to perform such a task.
In a Sun SPARC T4-2 machine with 128 virtual CPUs, we tested two different join methods with parallel executions:

-- nested loop took about 8h30
UPDATE TTQ_DQG.DQG_PRODUCT b
    SET billing_end_date = SYSDATE
    where exists (select /*+ parallel (a 12) */ 1 from DBSCRUB.ISOC_MAPPING_5M a where b.bo_id=a.old_bo_id);

-- hash join took about 5h
UPDATE /*+ parallel (b 12) */TTQ_DQG.DQG_PRODUCT b
    SET billing_end_date = SYSDATE
    where exists (select 1 from DBSCRUB.ISOC_MAPPING_5M a where b.bo_id=a.old_bo_id)
   ;

The 5h execution time still does not meet our downtime requirement. I proposed to perform the task in two steps.

(1) create a rowid driver table outside downtime window
(2) update the table with rowid in the downtime window

The test in the same machine shows we can do it in 41 min!
SQL>
SQL> --Step 1. Create a rowid driver table  several hours before downtime
SQL>
SQL> CREATE table  dbscrub.dqg_product_5m_rowid nologging
  2              AS
  3              select /*+ full(a) parallel( a 10) */ a.rowid rid from  TTQ_DQG.dqg_product a
  4              where exists ( select 1 from dbscrub.ISOC_MAPPING_5M b where a.bo_id=b.old_bo_id);
Table created.
Elapsed: 00:22:01.71
SQL>
SQL>
SQL>
SQL>
SQL> --Step 2 update table with rowid access in the downtime window
SQL>
SQL>
SQL> UPDATE  TTQ_DQG.CPF_PRODUCT b
  2  SET billing_end_date = SYSDATE
  3  where b.rowid in  (select /*+ parallel (a, 12) */  rid from DBSCRUB.dqg_product_5m_rowid a)
  4  ;
14431117 rows updated.
Elapsed: 00:41:34.28
SQL>
SQL>
SQL> spool off

Is this approach safe, any concerns? You may check this AskTom thread:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:53140678334596