Wednesday, January 27, 2010

Complete recovery after lost of all control files and data files.

Assuming all of the control files and one or more data files are lost during media failure, but the online redo logs are intact, can we do a complete recovery? In a previous post, I have demonstrated that it is possible when I have backup control files though I have to open database by resetlogs. In this post, I will demonstrate in this case we can do complete recovery without resetlogs by re-creating the control files.

Note: the scripts used see previous post.

Steps

1. perform hot backup

#> hot_backup.sh

2. Create a table with one row and switch logfile

SQL> drop table system.t;
SQL> create table system.t as select * from all_tables where rownum <=1; SQL> alter system switch logfile;

3. Insert the second row and switch logfile
SQL> insert into system.t select * from all_tables where rownum <=1; SQL> commit;
SQL> alter system switch logfile;

4. Insert the third row and switch logfile
5. Insert the forth row WITHOUT switch logfile

At this point, we should see total 4 rows in the table t:

sys@DBWRKEV1> select count(*) from system.t ;

COUNT(*)
----------
4

6. Delete all data files to simulate media failure, shutdown instance

rm /db1/u01/oradata/dbwrkev1/system01.dbf
rm /db1/u04/oradata/dbwrkev1/undotbs01.dbf
rm /db1/u02/oradata/dbwrkev1/sysaux01.dbf
rm /db1/u02/oradata/dbwrkev1/users01.dbf

7. Restore data files

#> restore_data.sh

8. Re-create control file

idle> conn / as sysdba
Connected to an idle instance.
idle> startup nomount;
ORACLE instance started.
 
Total System Global Area  473956352 bytes
Fixed Size                  1979520 bytes
Variable Size             314575744 bytes
Database Buffers          150994944 bytes
Redo Buffers                6406144 bytes
idle> @ctl
 
Control file created.


Below is the content of ctl.sql

CREATE CONTROLFILE REUSE DATABASE "DBWRKEV1" NORESETLOGS FORCE LOGGING ARCHIVELOG
    MAXLOGFILES 16
    MAXLOGMEMBERS 3
    MAXDATAFILES 100
    MAXINSTANCES 8
    MAXLOGHISTORY 292
LOGFILE
  GROUP 1 (
    '/db1/u05/archive/dbwrkev1/redo01a.log',
    '/db1/u06/archive/dbwrkev1/redo01b.log'
  ) SIZE 50M,
  GROUP 2 (
    '/db1/u05/archive/dbwrkev1/redo02a.log',
    '/db1/u06/archive/dbwrkev1/redo02b.log'
  ) SIZE 50M,
  GROUP 3 (
    '/db1/u05/archive/dbwrkev1/redo03a.log',
    '/db1/u06/archive/dbwrkev1/redo03b.log'
  ) SIZE 50M
-- STANDBY LOGFILE
DATAFILE
  '/db1/u01/oradata/dbwrkev1/system01.dbf',
  '/db1/u04/oradata/dbwrkev1/undotbs01.dbf',
  '/db1/u02/oradata/dbwrkev1/sysaux01.dbf',
  '/db1/u02/oradata/dbwrkev1/users01.dbf'
CHARACTER SET WE8ISO8859P1
;

Then I did:

idle> recover database;
ORA-00279: change 10467469165231 generated at 01/27/2010 13:36:56 needed for thread 1
ORA-00289: suggestion : /db1/archive/dbwrkev1/dbwrkev1_1_7_709302414.arc
ORA-00280: change 10467469165231 for thread 1 is in sequence #7
 
 
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
 
ORA-00279: change 10467469165563 generated at 01/27/2010 13:44:35 needed for thread 1
ORA-00289: suggestion : /db1/archive/dbwrkev1/dbwrkev1_1_8_709302414.arc
ORA-00280: change 10467469165563 for thread 1 is in sequence #8
ORA-00278: log file '/db1/archive/dbwrkev1/dbwrkev1_1_7_709302414.arc' no longer needed for this recovery
 
 
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
 
Log applied.
Media recovery complete.
idle> alter database open;
 
Database altered.
 
idle> select count(*) from system.t;
 
  COUNT(*)
----------
         4
 


Updated Feb 3, 2010 - It seems RMAN recovery can automatically apply changes in the online redo logs. See: http://oracle-study-notes.blogspot.com/2007/07/exercise-using-control-file-autobackup.html

Tuesday, January 26, 2010

NetApp Snapshot backup and recovery for Oracle databases

In my current working environment, we use NetApp Snapshot technology to backup some of  our Oracle databases. I have recently participated in a Snapshot backup and recovery test for three production databases that were migrated to new servers. In the tests, the Storage Admin team performed Snapshot backup and restore, the Unix Admin team performed diskgroup deport and import operations, and we, DBAs, were responsible for the database  recovery.

The databases have file layout in such a way that all of  data files and control files is in the volumn /db1,  online redo log files are in the volumn /db1redo, and archived logs are saved in volumn  /db1arch. The Snapshot backups are performed on all three volumns mentioned. We tested two recovery scenarios:

1. Restore /db1 and /db1redo
2. Restore /db1 only

I think if you are an experienced DBA, you will immediately question the practise of backup and restore of online redo logs. You are right. I had doubts at the time. Today, I have done simulations of these two scenarios in a test envrinment. I have been more clear that we should not do that at all.

Below I will describe the test cases in more details, which are the simulations of what we did in the Snapshot backup and recoery test in the production servers a few days ago.

Case 1 - Restore data files, control files and redo log files (e.g. /db1 and /db1redo)
 
1. Perform hotbackup
    #> hot_backp.sh
    Note: script see Appendix

2. Create a table with one row and switch logfile
    SQL> drop table system.t;
    SQL> create table system.t as select * from all_tables where rownum <=1;
    SQL> alter system switch logfile;

3. Insert the second row and switch logfile
   SQL> insert into system.t select * from all_tables where rownum <=1;
   SQL> commit;
   SQL> alter system switch logfile;

4. Insert the third row and switch logfile

5. Insert the forth row WITHOUT switch logfile
    At this point, we should see total 4 rows in the table t:
    sys@DBWRKEV1> select count(*) from system.t ;
    COUNT(*)
     ---------
     4

6. Delete all data files to simulate media failure, shutdown instance
     rm /db1/u01/oradata/dbwrkev1/system01.dbf
     rm /db1/u04/oradata/dbwrkev1/undotbs01.dbf
     rm /db1/u02/oradata/dbwrkev1/sysaux01.dbf
     rm /db1/u02/oradata/dbwrkev1/users01.dbf

7. Restore data files, control files and redo log file
   #> restore_data.sh
   #> restore_ctl.sh
   #> restore_redo.sh

8. Recover database
   After mount the database, I did:

    idle> recover database;
    Media recovery complete.
    idle> alter database open;
    Database altered.

    So here we observed the same in the real test a few days ago, we thought we did a complete recovery, but really?

     idle> select count(*) from system.t;
              select count(*) from system.t
                *
               ERROR at line 1:
              ORA-00942: table or view does not exist

   We can see the table t does not exist at all.

Case 2 Restore data file and control files (e.g. /db1)

Note: due to our data files and control files are both in /db1, with Snapshot restore we can not restore only one of them. This is one of the disavantages of NetApp Snapshot as I can see compared to RMAN backup and restore.

1-6. same as in case 1

7. Restore data files and control files
    #> restore_data.sh
    #> restore_ctl.sh

8. Recover database
   After mount the database, I did:

idle> recover database until cancel using backup controlfile;
ORA-00279: change 10467469061882 generated at 01/26/2010 10:45:48 needed for thread 1
ORA-00289: suggestion : /db1/archive/dbwrkev1/dbwrkev1_1_27_708088679.arc
ORA-00280: change 10467469061882 for thread 1 is in sequence #27
 
 
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
 
ORA-00279: change 10467469062192 generated at 01/26/2010 10:51:40 needed for thread 1
ORA-00289: suggestion : /db1/archive/dbwrkev1/dbwrkev1_1_28_708088679.arc
ORA-00280: change 10467469062192 for thread 1 is in sequence #28
ORA-00278: log file '/db1/archive/dbwrkev1/dbwrkev1_1_27_708088679.arc' no longer needed for this recovery
 
 
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
 
ORA-00279: change 10467469064362 generated at 01/26/2010 11:43:01 needed for thread 1
ORA-00289: suggestion : /db1/archive/dbwrkev1/dbwrkev1_1_29_708088679.arc
ORA-00280: change 10467469064362 for thread 1 is in sequence #29
ORA-00278: log file '/db1/archive/dbwrkev1/dbwrkev1_1_28_708088679.arc' no longer needed for this recovery
 
 
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
 
ORA-00279: change 10467469064390 generated at 01/26/2010 11:43:54 needed for thread 1
ORA-00289: suggestion : /db1/archive/dbwrkev1/dbwrkev1_1_30_708088679.arc
ORA-00280: change 10467469064390 for thread 1 is in sequence #30
ORA-00278: log file '/db1/archive/dbwrkev1/dbwrkev1_1_29_708088679.arc' no longer needed for this recovery
 
 
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
 
ORA-00279: change 10467469064400 generated at 01/26/2010 11:44:17 needed for thread 1
ORA-00289: suggestion : /db1/archive/dbwrkev1/dbwrkev1_1_31_708088679.arc
ORA-00280: change 10467469064400 for thread 1 is in sequence #31
ORA-00278: log file '/db1/archive/dbwrkev1/dbwrkev1_1_30_708088679.arc' no longer needed for this recovery
 
 
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
 
ORA-00308: cannot open archived log '/db1/archive/dbwrkev1/dbwrkev1_1_31_708088679.arc'
ORA-27037: unable to obtain file status
SVR4 Error: 2: No such file or directory
Additional information: 3
 
 
idle> recover database until cancel using backup controlfile;
ORA-00279: change 10467469064400 generated at 01/26/2010 11:44:17 needed for thread 1
ORA-00289: suggestion : /db1/archive/dbwrkev1/dbwrkev1_1_31_708088679.arc
ORA-00280: change 10467469064400 for thread 1 is in sequence #31
 
 
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
Media recovery cancelled.
idle> alter database open resetlogs;
 
Database altered.
 
idle> select count(*) from system.t;
 
  COUNT(*)
----------
        3


So, obviously we lost the changes recorded in the online redo log. This is an imcomplete recvoery, corresponding to lost all control files and (some of) data files, but online redo logs are intact. However, if we know the current online redo log at the time of media failure ( We can find it out in the alert log file), we probably can recover more.

For example, in my test, I knew the current online redo logs are those of group 2. So I did:

#> cp redo02a.log /db1/archive/dbwrkev1/dbwrkev1_1_31_708088679.arc

Then I repeated the steps 6,7,8 as in the test case 2, let's see what we got:

idle> recover database until cancel using backup controlfile;
ORA-00279: change 10467469061882 generated at 01/26/2010 10:45:48 needed for thread 1
ORA-00289: suggestion : /db1/archive/dbwrkev1/dbwrkev1_1_27_708088679.arc
ORA-00280: change 10467469061882 for thread 1 is in sequence #27
 
 
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
 
ORA-00279: change 10467469062192 generated at 01/26/2010 10:51:40 needed for thread 1
ORA-00289: suggestion : /db1/archive/dbwrkev1/dbwrkev1_1_28_708088679.arc
ORA-00280: change 10467469062192 for thread 1 is in sequence #28
ORA-00278: log file '/db1/archive/dbwrkev1/dbwrkev1_1_27_708088679.arc' no longer needed for this recovery
 
 
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
 
ORA-00279: change 10467469064362 generated at 01/26/2010 11:43:01 needed for thread 1
ORA-00289: suggestion : /db1/archive/dbwrkev1/dbwrkev1_1_29_708088679.arc
ORA-00280: change 10467469064362 for thread 1 is in sequence #29
ORA-00278: log file '/db1/archive/dbwrkev1/dbwrkev1_1_28_708088679.arc' no longer needed for this recovery
 
 
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
 
ORA-00279: change 10467469064390 generated at 01/26/2010 11:43:54 needed for thread 1
ORA-00289: suggestion : /db1/archive/dbwrkev1/dbwrkev1_1_30_708088679.arc
ORA-00280: change 10467469064390 for thread 1 is in sequence #30
ORA-00278: log file '/db1/archive/dbwrkev1/dbwrkev1_1_29_708088679.arc' no longer needed for this recovery
 
 
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
 
ORA-00279: change 10467469064400 generated at 01/26/2010 11:44:17 needed for thread 1
ORA-00289: suggestion : /db1/archive/dbwrkev1/dbwrkev1_1_31_708088679.arc
ORA-00280: change 10467469064400 for thread 1 is in sequence #31
ORA-00278: log file '/db1/archive/dbwrkev1/dbwrkev1_1_30_708088679.arc' no longer needed for this recovery
 
 
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
 
Log applied.
Media recovery complete.
idle> alter database open resetlogs;
 
Database altered.
 
idle> select count(*) from system.t;
 
  COUNT(*)
----------
         4
 
Bingo! It seemed I did a complete recovery!

[Jan 27,2010 updated: this post shows another way to do a complete revcovery in this situation]

Another thing I want to point out is that if we only lost data files whereas the control files and online redo logs are intact, we can do a complete recovery easily as long as we have archived logs. However, in our envionment when we restore snapshot backup, we also overwrite the current control files. So we must be cautious and it is thus advisable to copy the intact control files to some other location. It is probaly a good idea that we should not place all the control files in the same volumn as data files.

BTW, the information in the following two links gave me some ideas about this topic.

http://freekdhooge.wordpress.com/2007/12/03/disaster-recovery-troubles/
http://download.oracle.com/docs/cd/B10501_01/server.920/a96519/strategy.htm#1004884


Appendix - Scripts used:

#!/bin/ksh
# script:  hot_backup.sh
#
sqlplus "/ as sysdba" <<EOF
alter tablespace UNDOTBS1 begin backup;
alter tablespace USERS    begin backup;
alter tablespace SYSTEM   begin backup;
alter tablespace SYSAUX   begin backup;
EOF
echo " -- tablespace begin backup mode "

cp /db1/u01/oradata/dbwrkev1/system01.dbf    /db1/backup
cp /db1/u04/oradata/dbwrkev1/undotbs01.dbf   /db1/backup
cp /db1/u02/oradata/dbwrkev1/sysaux01.dbf    /db1/backup
cp /db1/u02/oradata/dbwrkev1/users01.dbf     /db1/backup
cp /db1/u05/archive/dbwrkev1/redo03a.log     /db1/backup
cp /db1/u06/archive/dbwrkev1/redo03b.log     /db1/backup
cp /db1/u05/archive/dbwrkev1/redo02a.log     /db1/backup
cp /db1/u06/archive/dbwrkev1/redo02b.log     /db1/backup
cp /db1/u05/archive/dbwrkev1/redo01a.log     /db1/backup
cp /db1/u06/archive/dbwrkev1/redo01b.log     /db1/backup
cp /db1/u01/oradata/dbwrkev1/control01.ctl   /db1/backup 
cp /db1/u02/oradata/dbwrkev1/control02.ctl   /db1/backup 
cp /db1/u03/oradata/dbwrkev1/control03.ctl   /db1/backup


echo " -- data file, control file and redo log backup done "

sqlplus "/ as sysdba" <<EOF
alter tablespace UNDOTBS1 end backup;
alter tablespace USERS    end backup;
alter tablespace SYSTEM   end backup;
alter tablespace SYSAUX   end backup;
EOF

echo " -- tablespace end backup mode "

sqlplus "/ as sysdba" <<EOF
alter system archive log current;
alter database backup controlfile to '/tmp/backupctrl.ctl' reuse;
EOF
echo " -- archive current redo log and backup controlfile done"

#!/bin/ksh
# script: restore_data.sh
#

cp /db1/backup/system01.dbf  /db1/u01/oradata/dbwrkev1 
cp /db1/backup/undotbs01.dbf /db1/u04/oradata/dbwrkev1 
cp /db1/backup/sysaux01.dbf  /db1/u02/oradata/dbwrkev1 
cp /db1/backup/users01.dbf   /db1/u02/oradata/dbwrkev1 

#!/bin/ksh
#   restore_ctl.sh 
#

cp /db1/backup/control01.ctl /db1/u01/oradata/dbwrkev1   
cp /db1/backup/control02.ctl /db1/u02/oradata/dbwrkev1   
cp /db1/backup/control03.ctl /db1/u03/oradata/dbwrkev1   

#!/bin/ksh
# script: restore_redo.sh
#
cp /db1/backup/redo03a.log   /db1/u05/archive/dbwrkev1    
cp /db1/backup/redo03b.log   /db1/u06/archive/dbwrkev1    
cp /db1/backup/redo02a.log   /db1/u05/archive/dbwrkev1    
cp /db1/backup/redo02b.log   /db1/u06/archive/dbwrkev1     
cp /db1/backup/redo01a.log   /db1/u05/archive/dbwrkev1     
cp /db1/backup/redo01b.log   /db1/u06/archive/dbwrkev1  


Wednesday, January 13, 2010

Clone an online database - applying standby feature

In a previous post, I described how to clone an online database by transferring data files through network. Since the cloned database should not be opened immediately after the copy, instead, it needed to be opened one week later. This was to minimize the maintenance window needed to configure the cloned database. In this scenario, we need to ship the archived logs from source to target, and also apply them in the target during the whole week in order to keep the source and target database in sync. For that purpose, I wrote a shell script and put it in the cron to automatically transfer the archived logs and then manually applying the logs.

Inspired by a fellow DBA recently, I have found that we can take advantage of Oracle standby technology to achieve the log transportation and apply purpose automatically. I thus did a test on a testing 10g database to demonstrate the concept, which consisted of the following steps:

1. Copied selected tablespaces from source to target, using the same method as in the previous post
    Tablespaces: SYSTEM, SYSAUX, USERS and UNDOTBS1

2. Created standby control file in source and copied them to the target.e.g
     a. At source, alter database create standby controlfile as '/tmp/stby.ctl';
     b. scp from source to target server
     c. At target, issue:
        cp /tmp/stby.ctl /db1/u01/oradata/orasid1/control01.ctl
        cp /tmp/stby.ctl /db1/u02/oradata/orasid1/control02.ctl
        cp /tmp/stby.ctl /db1/u03/oradata/orasid1/control03.ctl

3. Added in the init.ora a parameter: standby_file_management = manual
    Note: this is to take into account that data file directory may be different between source and targer server.

4. Mount the target database with the standby controlfile.
     I then did Offline drop all those data files that are not copied
   e.g.  alter database datafile 'filename' offline drop;

5. In mount mode, issue:
    SQL> RECOVER STANDBY DATABASE UNTIL CANCEL;
    Note: this is to apply all available archived logs upto the current time manually.

6. Started the managed recovery of the standby database at target server
    SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT;

7. In the source db issue:
    SQL> ALTER SYSTEM SET log_archive_dest_2='SERVICE=ORASID1_STBY LGWR ASYNC';

8. Verify archived log are shiped and applied automatically.
    for examples:
     SELECT SEQUENCE#,APPLIED FROM V$ARCHIVED_LOG ORDER BY SEQUENCE#;

9. Opened the standby database as a regular operational database
    a. At source issue: ALTER SYSTEM SET log_archive_dest_2=''

    b. At target, cancel Redo Apply:
        ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
    c. At target issue: alter database activate physical standby database
    d. ALTER DATABASE OPEN;
    e. Drop those tablespace not copied and needed in the target server.