Monday, September 09, 2013

My Experience of Data Pump Export and Import for Database Migration

Recently I have carried out a test of using Data Pump Export and Import utilities to migrate a 17 TB database from Solaris to Linux platform, also from single-instance, conventional file system to a two-node RAC on ASM. During the exercise I have learned and become more familiar with quite a few interesting features or options of Data Pump Export and Import utilites and encountered a few difficulties as well.

The first obstacle I met was that I was unable to do Data Pump Export to the NFS storage. We have the NFS storage that is shared between the Solaris and Linux server. At first I tried to perform Data Pump Export to the NFS storage directly. Unfortunately, the job was just stuck there for long time. Having researched this issue for a while and tried the options mentioned in a MOS note (See References [1]) with the help of a SA, I just could not get this worked out. Finally we decided to export to the local file system first then move the dump files to the NFS. I was able to move about 150 GB dump file per hour. Fortunately, the Data Pump Import from NFS storage did not pose any problems.

Table mode Data Pump Export and Import were used in the test. Large tables were typically exported or imported individually and smaller tables were grouped together into several batches. For each large table or a group of smaller tables, I prepared the corresponding parameter files with parallel option if appropriate. In the Import job, I always excluded the index, constraint and statistics. I used SQLFILE option of the impdp command to extract the DDL commands related to the tables, including create table, index and constraint statements etc. Those DDLs were executed after the Import with parallel option for index creation and "ENABLE NOVALIDATE" for constraints typically. Therefore the database migration actually consisted of multiple tables migration jobs. For each tables migration, I performed and recorded the timing of the following steps: export, move dump file to NFS, import, DDLs for index etc. Those tables migration jobs were overlapped as two or three of them were executed at the same time and could be in different steps.

Shortly after the onset of the test I found out that Data Pump Export was extremely slow on a big table with lob column. The  export job was unable to run in parallel regardless of the parallel setting. The segment sizes of this big table are 237 GB and 382 GB for table and lob respectively. The table uses a sequence-based PK column called "EMAIL_SEQ_ID". To speed up the export, I started 23 export jobs at the same time, each job exported a particular range of rows based on the PK. The key is to use the QUERY option for the expdp command. The QUERY option contains a WHERE clause that specifies the PK range. Finally I was able to export this table in 30 hours, move the 420 GB dump files in 2 hours, import in about 15 hours and execute DDL for indexes etc in 2 hours. It is worth noting that although we can perform the lob table export job in parallel by such an approach, we cannot do the same for import. The import job will hold a TM lock for the table, so only one import job can run at a time. I used this approach for several big lob tables. Appendix showed example Shell script to prepare expdp or impdp parfile. By the way, later I found that unable to export in parallel for lob table is a known issue in the Oracle community. (see Reference [3]). A blog post suggested using rowid based approach to separate the rows of lob tables. (see Reference [4])

Occasionally I made mistakes and needed to terminate an export or import job. I found it very convenient to use KILL_JOB command in the interactive mode. First, I need to know the job name either from log file or from dba_datapump_jobs view. Then I can enter the interactive mode using attach option:

expdp / attach=[job_name]

Then I issued "KILL_JOB" command. Note when using KILL_JOB to the expdb, all dump files will be wiped out.

During the test, I have also learned that Data Pump Import can be done through db link. When specifying a source database link for the NETWORK_LINK option of impdp command, import job will retrieve data from the db link, and write the data directly to the target database There are no dump files involved.

I have mentioned previously SQLFILE option can be used to extract DDLs. But this option only available for impdp. Suppose I have a table in a database and I want to extract DDLs related to this table including indexes, constraints, object grants, comments, triggers etc associated with it, how should I do? Previously I always use DBMS_METADATA package for such tasks. If we create a db link to the database itself, we can use impdp with NETWORK_LINK and SQLFILE options to accomplish it very neatly.

Using export/import to migrate the database has one advantage over Transportable Tablespace approach at least: it allows the re-organization of tables and indexes. i.e. it is easy to move tables and indexes to different tablespaces if desired.

I completed the 17 TB database migration with the size of about 12 TB in the target database in about a month. The test was interrupted by other tasks with higher priority. I did total 17 tables migrations to complete the whole database migration. Ordered by end-to-end time (including export, move to NFS, import, DDL steps), they were refered to as as batch 1 to 17 here:

batch   time (hours) 
-----   ----------- 
1        50
2        30
3        25
4        24
5        21
6        16
7        15
8        14
9        11
10       10
11       10
12       9.5
13       5.5
14       5
15       3.5
16       3
17       3
------------



Based on above timing, if I perform the migration task as continuely as possible, I may be able to compete it in 5-7 days. The ultimate goal is to do a zero-down time migration. We plan to use oracle Golden Gate to caputure the changes at source during the database migration. So my next task will be investiagating whether this is possible in practise in our environment.

References:
[1] 781349.1 - ORA-27054: NFS file system where the file is created or resides is not mounted with correct options [ID 781349.1])
[2] Sample Export and Import parameter files:
 -- expdp.par  ----


 DIRECTORY=dpump_dir3
 DUMPFILE=dpump_dir3:email%U_q2.dmp
 TABLES=TTQ_GMPX.TTQ_EMAIL_EVENT_DETAILS
 LOGFILE=dpump_dir3:TTQ_EMAIL_EVENT_DETAILS_q2.log
 #PARALLEL=20
 FILESIZE=5G
 JOB_NAME=exp_email_q2
 QUERY="WHERE email_seq_id >=2*200000000   and email_seq_id < 3*200000000"


  -- impdp.par ----

 DIRECTORY=dpump_dir
 DUMPFILE=dpump_dir:email%U_q2.dmp
 TABLES=TTQ_GMPX.TTQ_EMAIL_EVENT_DETAILS
 TABLE_EXISTS_ACTION=APPEND
 LOGFILE=dpump_dir:TTQ_EMAIL_EVENT_DETAILS_q2_imp.log
 PARALLEL=2
 JOB_NAME=imp_email_q2
 EXCLUDE=index,constraint,statistics
 #EXCLUDE=statistics
 transform=storage:n,oid:n
 CLUSTER=NO

[3] Bug 5599947 - Export Data Pump is slow when table has a LOB column - Defect: Bug:5599947 "DATAPUMP EXPORT VERY SLOW"

[4] http://jensenmo.blogspot.com/2012/10/optimising-data-pump-export-and-import.html

[5] Some other references:
Master Note for Data Pump [ID 1264715.1]
Checklist For Slow Performance Of DataPump Export (expdp) And Import (impdp) [ID 453895.1]
Parallel Capabilities of Oracle Data Pump [ID 365459.1]


Appendix - sample shell script to generate expdp/impdp par file for big lob table

#!/bin/ksh
i=21
while [[ $i -le 43 ]];do
 
  ((j=i+1))
  PARFILE="EMAIL_CONTENT_q$i.par"
  echo "DIRECTORY=dpump_dir3" > $PARFILE 
  echo "DUMPFILE=dpump_dir3:ecntnt_q${i}_%U.dmp" >> $PARFILE 
  echo "TABLES=TTQ_GMPX.EMAIL_CONTENT" >> $PARFILE
  echo "LOGFILE=dpump_dir3:EMAIL_CONTENT_q$i.log" >> $PARFILE 
  echo "FILESIZE=10G" >> $PARFILE
  echo "JOB_NAME=exp_ecntnt_q$i" >> $PARFILE
  echo "QUERY=\"WHERE email_seq_id > $i * 2000000  and email_seq_id <= $j * 2000000\"">> $PARFILE
 
 
  echo "i=$i"
  echo "j=$j"
  ((i=i+1))
done

--- sample script to generate impdp par file for big lob table -----------------

i=6
while [[ $i -le 43 ]];do
 
  ((j=i+1))
  PARFILE="EMAIL_CONTENT_imp_q$i.par"
  echo "DIRECTORY=dpump_dir" > $PARFILE 
  echo "DUMPFILE=ecntnt_q${i}_%U.dmp" >> $PARFILE 
  echo "TABLES=TTQ_GMPX.EMAIL_CONTENT" >> $PARFILE
  echo "TABLE_EXISTS_ACTION=APPEND" >> $PARFILE
  echo "LOGFILE=EMAIL_CONTENT_q${i}_imp.log" >> $PARFILE 
  echo "JOB_NAME=imp_ecntnt_q${i}" >> $PARFILE
  echo "#EXCLUDE=index,constraint,statistics" >> $PARFILE
  echo "EXCLUDE=statistics" >> $PARFILE 
  echo "transform=storage:n,oid:n" >> $PARFILE
  echo "CLUSTER=NO" >> $PARFILE
 
 
  echo "i=$i"
  echo "j=$j"
  ((i=i+1))
done


No comments: