Thursday, June 19, 2008

Clone a Production database for Disaster Recovery Purpose

I was engaged in a project months ago which was to create a DR database by cloning a terabyte production database from  the data center in CA to the data center in NY and set up  a one-way SharePlex replication from production to DR.

One of the major tasks of this project was to transfer the production database datafiles and archived logs to the DR site. Prior to the implementation window for the replication setup, it was required that the production and DR databases in sync.

To transfer the production database datafiles, we adopted the following procedure for each tablespace:

1. Put the tablespace in backup mode
alter tablespace [tablespace_name] begin backup

2. Transfer the datafiles that belong to the tablespace.
  We used compressed transfer method,  for example,at DR site:

ssh2 prod_hostname 'cat ${FNAME}| gzip -c' | gunzip -c > ${FILELOC}/${FNAME}

[Ed. Jul 26,2009, Check here for a SQL to generate the command ]

3. End the backup mode

alter tablespace [tablespace_name] end backup


All of the production database datafiles were transferred to the DR site one week before the implementation window. During the time of datafile transfer and the waiting period, we also transferred the archived logs contininusely , which would be needed to recover the database. It should be noted that the production database could generate about 30-50 GB archived logs each week day. At peak time, a 500M archived log can be generate at about every 6-7 min.

To automatically transfer the archived log, a cron job was set up at DR site, which check every 5 min whether there are new archived logs generated at production site; if it finds any, it starts the transfer.(Jan 13,2010 updated: it is also possible to use standby technology to achieve the same goal, see a post here) We have tested that it took about 5 min to do the transfer for one archived log (zip, transfer and unzip). The script for the cron job is shown as follows:



-------------------    transfer archived logs script ---------------------
#!/bin/sh
# Script: xfr_archlog.sh
#

SCRIPTLOC=/home/oracle/xfr_cron
ARCHLOGLOC=/db1arch/archive/dbdr

TSTAMP=`date +%a`

cd ${SCRIPTLOC}

#
# --- check if there are any archived logs file newer than
# --- the reference file saved in reffile.txt. Save any candidates to xfr_files.txt
#

NEWFILE=`cat reffile.txt`
echo 'find /db1arch/archive/dbprd -name "*.arc" -newer ' $NEWFILE > sshcommand.tmp
SSHCOMMAND=`cat sshcommand.tmp`
ssh prod_hostname $SSHCOMMAND > xfr_files.txt

COUNT=`grep -c "arc" xfr_files.txt`

#
# --- if there are new archived logs that need to be transferred, then
# --- update the xfr_files.txt
#

if [ $COUNT -ge 1 ]; then
echo "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" >> xfr_files_${TSTAMP}.log
echo "*** `date`" >> xfr_files_${TSTAMP}.log
echo "The following files need to be transfered:" >> xfr_files_${TSTAMP}.log
cat xfr_files.txt >> xfr_files_${TSTAMP}.log
tail -1 xfr_files.txt > reffile.txt
fi
# tail -10 xfr_files_${TSTAMP}.log

#
# --- if there are new archived logs that need to be transferred, then
# --- generate the script: xfr_files.sh
#

if [ $COUNT -ge 1 ]; then
echo "#!/bin/sh" > xfr_files.sh
cat xfr_files.txt while read FNAME
do
echo $FNAME
FNAMESHORT=`basename $FNAME`
echo "ssh2 prod_hostname 'cat ${FNAME}  gzip -c'  gunzip -c >${ARCHLOGLOC}/$FNAMESHORT &" >> xfr_files.sh
done

echo 'wait' >> xfr_files.sh
chmod ug+x xfr_files.sh
echo "files transfer started at `date` waiting for completion \n" tee -a xfr_files_${TSTAMP}.log
echo ">>> contents of xfr_files.sh" tee -a xfr_files_${TSTAMP}.log
cat xfr_files.sh tee -a xfr_files_${TSTAMP}.log
#
# --- execute the script
#

xfr_files.sh

fi

------------end of the transfer archived logs script ---------



Prior to the implementation, the DR database is either shutdown or in the media recovery mode.
To apply the archived log at DR site, we took the following basic steps:

1. Mount the database
2. Issue the recover database command as follows:
SQL> RECOVER database until cancel using backup controlfile;

--- sample screen output ----

SQL> recover database until cancel using backup controlfile;
ORA-00279: change 9977668133031 generated at 04/29/2008 16:21:51 needed for
thread 1
ORA-00289: suggestion : /db1arch/archive/dbdr/dbprd_1_78535.arc
ORA-00280: change 9977668133031 for thread 1 is in sequence #78535


Specify log: {=suggested  filename  AUTO  CANCEL}


ORA-00279: change 9977668512137 generated at 04/29/2008 16:32:01 needed for
thread 1
ORA-00289: suggestion : /db1arch/archive/dbdr/dbprd_1_78536.arc
ORA-00280: change 9977668512137 for thread 1 is in sequence #78536
ORA-00278: log file '/db1arch/archive/dbdr/dbprd_1_78535.arc' no longer
needed for this recovery


Specify log: {=suggested  filename  AUTO  CANCEL}

--- end of sample screen output ----





3. We can choose 'AUTO' to apply all of the available logs if appropriate

4. Shutdown the database.

5. Prior to the implementation windows, make sure the db is in sync

6. Open the database
SQL> alter database open resetlogs.

We have encountered some situations during the meida revovery stage:

1. A new datafile was added to a tablespace in the production database, and this tablespace was also in the DR database.

In this case, during media recovery, an unnamed file was generated, we had to rename it. For example:

We may see similar error as follows:

-------

ORA-00279: change 9977688828260 generated at 04/30/2008 03:12:30 needed for
thread 1
ORA-00289: suggestion : /db1arch/archive/dbdr/dbprd_1_78592.arc
ORA-00280: change 9977688828260 for thread 1 is in sequence #78592
ORA-00278: log file '/db1arch/archive/dbdr/dbprd_1_78591.arc' no longer
needed for this recovery


Specify log: {=suggested  filename  AUTO  CANCEL}
ORA-00283: recovery session canceled due to errors
ORA-01244: unnamed datafile(s) added to controlfile by media recovery
ORA-01110: data file 191: '/db1/u02/oradata/dbprd/DB_DATA_4_02.dbf'


ORA-01112: media recovery not started


SQL> RECOVER  database using backup controlfile until cancel;
ORA-00283: recovery session canceled due to errors
ORA-01111: name for data file 191 is unknown - rename to correct file
ORA-01110: data file 191: '/apps/opt/oracle/product/9.2.0/dbs/UNNAMED00191'
ORA-01157: cannot identify/lock data file 191 - see DBWR trace file
ORA-01111: name for data file 191 is unknown - rename to correct file
ORA-01110: data file 191: '/apps/opt/oracle/product/9.2.0/dbs/UNNAMED00191'
-------




To fix this problem, mount the database and issue the following command:

SQL> alter database create datafile
'/apps/opt/oracle/product/9.2.0/dbs/UNNAMED00191' as
'/db1/u02/oradata/dbdr/DB_DATA_4_02.dbf';

Note: we have different directory structure between production and DR site. Typically, production has .../dbprd/...; DR has .../dbdr/...

2. A new datafile was added to a tablespace in the production database, however this tablespace was not in the DR database.

Unfortunately I did't remember exactly how this issue was resolved. Probably, we renamed the unnamed file to something, then dropped it.

[Ed. Sep 11, 2009 Peform: mount db; alter database datafile_name drop offline; this should fix]


Note: There were some tablespaces not required in DR and thus their datafiles were not copied over.At first I had some concerns about that as I thought Oracle would throw errors during the recovery of the DR database  if it could not find the datafiles to apply the archived logs.  It turned out that Oracle just silently ignored those log entries.

There is a Metalink note (224274.1): "How To Make A Copy Of An Open Database For Duplication To A Different Machine", which served as a good reference for this task.

For setting up the SharePlex replication, fortunately we had a good document to follow, it completed without any surprises.

No comments: