We have a requirement to update a date column of about 14M rows in a large table of billions of rows . The challenge is we have limited downtime to perform such a task.
In a Sun SPARC T4-2 machine with 128 virtual CPUs, we tested two different join methods with parallel executions:
-- nested loop took about 8h30
UPDATE TTQ_DQG.DQG_PRODUCT b
SET billing_end_date = SYSDATE
where exists (select /*+ parallel (a 12) */ 1 from DBSCRUB.ISOC_MAPPING_5M a where b.bo_id=a.old_bo_id);
-- hash join took about 5h
UPDATE /*+ parallel (b 12) */TTQ_DQG.DQG_PRODUCT b
SET billing_end_date = SYSDATE
where exists (select 1 from DBSCRUB.ISOC_MAPPING_5M a where b.bo_id=a.old_bo_id)
;
The 5h execution time still does not meet our downtime requirement. I proposed to perform the task in two steps.
(1) create a rowid driver table outside downtime window
(2) update the table with rowid in the downtime window
The test in the same machine shows we can do it in 41 min!
SQL>
SQL> --Step 1. Create a rowid driver table several hours before downtime
SQL>
SQL> CREATE table dbscrub.dqg_product_5m_rowid nologging
2 AS
3 select /*+ full(a) parallel( a 10) */ a.rowid rid from TTQ_DQG.dqg_product a
4 where exists ( select 1 from dbscrub.ISOC_MAPPING_5M b where a.bo_id=b.old_bo_id);
Table created.
Elapsed: 00:22:01.71
SQL>
SQL>
SQL>
SQL>
SQL> --Step 2 update table with rowid access in the downtime window
SQL>
SQL>
SQL> UPDATE TTQ_DQG.CPF_PRODUCT b
2 SET billing_end_date = SYSDATE
3 where b.rowid in (select /*+ parallel (a, 12) */ rid from DBSCRUB.dqg_product_5m_rowid a)
4 ;
14431117 rows updated.
Elapsed: 00:41:34.28
SQL>
SQL>
SQL> spool off
Is this approach safe, any concerns? You may check this AskTom thread:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:53140678334596