Monday, June 17, 2013

GoldenGate: Steps of Resynchronizing a Table

In the target, Replicat process was abended due to the following error:

2013-06-17 10:17:30  ERROR   OGG-01163  Oracle GoldenGate Delivery for Oracle, rtdnfrd.prm:  Bad column length (3) specified for column TAX_AUTHORITY_TYPE in table WAX_TDN.V_TAX_DETAILS, maximum allowable length is 1.

I checked TAX_AUTHORITY_TYPE colume, it is char(1) in both source and target databases, so not sure how this error comes up. I have to comment it out in the Replicat parameter file in order to re-start the process. As a result, the table is out-of-sync. This actually gives me a chance to pratice steps of resynchronizing a table in GoldenGate for the first time.

I took the following steps:

1. Comment out the table in the Replicat parameter file
Already did.


2. Stop Replicat and start it again so that it continues for unaffected tables
Already did.

3. Record the time stamp on the source system
2013-06-17 13:49

4 Start a copy of the source data for the affected tables
 
Note: before making the copy, try to resolve any long-running transactions

5. Import the copy to the target table

6. Create a new Replicat group for the out-of-sync table, using Begin to start at the source time stamp that you recorded earlier and using the existing train for ExtTrain

Add Replicat  rvtd, ExtTrail ./dirdat/rb, Begin 2013-06-17 13:49:00

7. Create the new parameter file so that it includes 
  HandleCollisions

replicat rvtd
SETENV(ORACLE_HOME="/apps/opt/oracle/product/11.2.0/db_1")
SETENV(ORACLE_SID = "tdnprdfd1")
SETENV(NLS_LANG=AMERICAN_AMERICA.AL32UTF8)
-- DBOPTIONS SUPPRESSTRIGGERS
userid
gg_owner@tdnprdfd1, password AADAAAAAAAAAAAHAJIKGFGKGRJTBJBCIZGUERJNHBFPBLCAEOBUADFWASJMJCDWEICWGBEGHOIRESCPA, encryptkey securekey1
discardfile ./dirrpt/RTDNFRD.dsc, Append, megabytes 1
handlecollisions
assumetargetdefs
MAP WAX_TDN.V_TAX_DETAILS,             TARGET WAX_TDN.V_TAX_DETAILS;


8. Start the new Replicat

9. View the new Replicat's lag until it shows "At EOF, no more records to process"
GGSCI> Send replicat rvtd, GetLag

10. Turn off HandleCollisions in the new replicate with
GGSCI> send replicat rvtd NoHandleCollisions

11. Edit the parameter file to comment out or remove HandleCollisions if you ever bounce the process later
(The next steps merge the table back with the others so that only one Replicat group is needed

12. Stop Extract at source

13. View both Replicats's lag until you see "EOF" again

GGSCI> Send replicat rtdnfrd, GetLag
GGSCI> Send replicat rvtd, GetLag

14 stop both Replicats

15. Uncomment the resynced table in the original Replicat parameter file

16. Start Extract

17. Start the original Replicat:
GGSC> start rtdnfrd

18. Delete the new Replicat that you created:
GGSCI> delete replicat rvtd

All done!

GoldenGate Replicat Process Abending due to Tablespace Full and Discard File Exceeding Max Bytes

We have a cron job set up to monitor errors in the GoldenGate ggserr.log. This monrining, in a target database, we recieved:

< 2013-06-17 02:58:17  ERROR   OGG-01172  Oracle GoldenGate Delivery for Oracle, rvasip.prm:  Discard file (./dirrpt/RVASIP.dsc) exceeded max bytes (1000000).
< 2013-06-17 02:58:17  ERROR   OGG-01668  Oracle GoldenGate Delivery for Oracle, rvasip.prm:  PROCESS ABENDING.


It appeared that Replicat process abended due to  Discard file exceeded max bytes.
Discard file is used by GoldenGate to log records it cannot proccess. The maximum size of the discard file can be specified by MAXBYTES or MEGABYSTS options,
the defaults are 1000000 or 1MB.  If the specified size is exceeded, the process will abend.

Further troubleshooting showed the reason for the discard file filled up was due to a tablespace filled up in this case. In the RVASIP.dsc files we can found:

OCI Error ORA-01653: unable to extend table PPOWNER.VZ_JNR_FEED_TRX_LOG by 8192 in tablespace PPOWNER_DATA_1 (status = 1653). INSERT INTO "PPOWNER"."VZ_JNR_FEED_TRX_LOG" ("ID","JOURNAL_ID"
,"TRX_EXT_ID","BILLED","DB_MODIFICATION_DATE","DB_CREATION_DATE","SUB_TRX_ID","TRX_ID","CHG_ATTR","ACCESS_TYPE") VALUES (:a0,:a1,:a2,:a3,:a4,:a5,:a6,:a7,:a8,:a9)
Aborting transaction on ./dirdat/rt beginning at seqno 21275 rba 3795007
                         error at seqno 21275 rba 5169654


To fix the problem, I renamed RAVSIP.dsc, changed the max bytes of discard file to be 10MB in the parameter file:

discardfile ./dirrpt/RVASIP.dsc, Append,megabytes 10

Then I stopped and started the Replicat process. I have verified that those discarded dmls recorded in the discard file have been applied after Replicat process re-started. no manual intervene is ndeed.

Monday, June 03, 2013

Tuning a Hierarchical Query



Encountered an expensive production sql today, basically it is in the following structure:
select BM.*, V.*  
FROM 
   BMXYZ BM,
   VXYZ V,
   BBXYZ BB
WHERE 
  V.BO_ID=BM.BO_ID
 AND BM.VOL_PARENT_BO_ID IN(SELECT B.VOL_PARENT_BO_ID
                            FROM BMXYZ B
                        START WITH BB.BO_ID=B.VOL_PARENT_BO_ID
                CONNECT BY PRIOR B.BO_ID = B.VOL_PARENT_BO_ID )
AND BB.USER_ID='xyzuvw'
AND V.CONTENT_VENDOR_ID='3000000'
;

At the first glance, it seems there are no join conditions involving BB. Finally I was able to understand what the sql tries to do :

(1) obtain a set of BO_ID's from table BB
(2) for each BO_ID in the set, find all child rows of it from the table BM
(3) finally row source from (2) join table V.

After rewriting it as follows, the query run much faster with only hundreds gets:
select BM.*, V.*  
FROM 
   BMXYZ BM,
   VXYZ V
WHERE 
  V.BO_ID=BM.BO_ID
 AND BM.VOL_PARENT_BO_ID IN(SELECT B.VOL_PARENT_BO_ID
                            FROM BMXYZ B, (select bo_id from BBXYZ  where user_id='xyzuvw') BB
                        START WITH BB.BO_ID=B.VOL_PARENT_BO_ID
                CONNECT BY PRIOR B.BO_ID = B.VOL_PARENT_BO_ID )
AND V.CONTENT_VENDOR_ID='3000000'
;