The problem of duplicated rows was identified in terms of there are mutiple rows having same (oid, eid, mid,st) in the SMC table in the reporting database.
The reason for the existing of the duplicated rows may be as follows:
(1) The PK and UK constraints have not been enabled
(2) In the reporting database, single post has been changed to multiple post queue.
Since there is no testing environment available, the actual reason for the problem has not been able to be confirmed by repeating the issue. However, it is obviously wrong that the PK and UK are not enabled in the SMC table in reporting database.
To fix this problem, I have to identify and remove the duplicated rows and then enable the PK and UK.
First of all, by comparing the row counts of the table in production and reporting using the following query, it can be known that the duplication only occurred during one day. ( Since the table is huge: > 100 GB, it is not wise to try to find the duplicated rows in all date range)
select count(*) from sf.smc where mc_date >= to_date('2008-05-21 00:00:00', 'YYYY-MM-DD HH24:MI:SS') and mc_date <= to_date('2008-05-21 23:59:59', 'YYYY-MM-DD HH24:MI:SS') /Then, I created a temporary table from the SMC table in the reporting database as follows:
create table temp_a nologging as select id, oid, eid, mid, st FROM sf.smc where mc_date > to_date('2008-05-21 00:00:00', 'YYYY-MM-DD HH24:MI:SS') and mc_date > to_date('2008-05-21 23:59:59', 'YYYY-MM-DD HH24:MI:SS') / execute dbms_stats.gather_table_stats(user, 'TEMP_A', null,estimate_percent=>10);The duplicated rows as well as the baseline row ( i.e. the row to which any other rows compared in order to determine if they are duplicated )are displayed through the following sql:
select id, oid, eid, st, mid from smc where (oid, eid, st, mid ) in ( select oid, eid, st, mid FROM temp_a A WHERE rowid > (SELECT min(rowid) FROM temp_a B WHERE B.oid = A.oid and B.eid = A.eid and B.mid = A.mid and B.st = A.st ) ) order by 2,3,4,5The following are the duplicate rows in RPT
ID OID EID ST MID ---------- -------------------- -------------------- ----------- ---------- 1037693787 CICN087494895 489740124 Change 8 1036151673 CICN087494895 489740124 Change 8 1036151856 CICN087494953 430282795 Change 2 1036152154 CICN087494953 430282795 Change 2 1037621342 ICOG087494888 503141443 Install 19 1036151603 ICOG087494888 503141443 Install 19 1036151617 IICN087494988 503141881 Install 8 1036183677 IICN087494988 503141881 Install 8 8 rows selected.Note the column ID is different, to decide which row should be removed, we need to know which ID is presented in the produciton database: The following are the corresponding row in the production
ID OID EID ST MID ---------- -------------------- -------------------- ---------- ---------- 1037693787 CICN087494895 489740124 Change 8 1036152154 CICN087494953 430282795 Change 2 1037621342 ICOG087494888 503141443 Install 19 1036183677 IICN087494988 503141881 Install 8
As a result, the rows with the following id will be deleted in the reporting database:
'1036151673'
'1036151856'
'1036151603'
'1036151617'
Note: schema, table and column names have been modified to hide any possible confidential information
updated: Mar 29, 10 - When dealing with duplicated rows issue, it is obvious we should be very clear about what the meaning of 'duplicate' for a particular table, all columns value are same or just some columns values are same?. If we only consider PK columns, for example (col1, col2) are supposed to be PK, but the constraint are not enabled and we want to delete the duplicated rows in order to create the PK. We can do:
create table tab_tmp
as
select a.rowid, a.col1, a.col2
from tab a
where a.rowid >
( select max(rowid)
from tab b
where a.col1 = b.col1
and a.col2=b.col2);
We maybe want to create index on (col1 col2) if not there first , non-unique of course. Then we delete duplicated row by:
delete from tab a
where a.rowid in ( select rowid from tab_temp);
Updated: Jul 29, 2010 -
In this post: http://viralpatel.net/blogs/2010/06/deleting-duplicate-rows-in-oracle.html , several ways of deleting duplicated rows are summarized. One interesting method is using analytic function, i.e.
DELETE FROM tbl_test WHERE ROWID IN ( SELECT rid FROM (SELECT ROWID rid, ROW_NUMBER () OVER (PARTITION BY ser_no, fst_nm, deptid, cmnt ORDER BY ROWID) rn FROM tbl_test) WHERE rn <>1);
No comments:
Post a Comment