Datapump in parallel with a shell script

Cyrille Modiano
Latest posts by Cyrille Modiano (see all)

I recently came across an issue while exporting a huge partitioned table for a data migration. The export took years without any obvious reason.

After some research I found that the parallelism wasn’t working, I could only see one datapump worker at a time, then I found this note explaining the behaviour (Doc ID 1467662.1) .

A table with BASICFILE lobs cannot be exported in parallel, oracle advise to convert those BASICFILES to SECUREFILES; this was not possible for me.

The second solution was to run multiple export jobs in parallel, each one exporting a fraction of the table rows. In the example, oracle use a query using a range of the primary key to do an export, this was ok but I needed a more automatic method to divide the rows based on the degree of parallelism I wanted to apply, then I found this blog post from Morten Jensen.

He uses the modulo and the rowid to create an equal division of the rows based on a parallelism parameter, that was exactly what I was looking for. He used a perl script to achieve that, I preferred to make a shell script to do it because I have only a little knowledge of Perl. I created one shell script to create the export, in my case it was to export a specific partition but the script supports tables as well, and one script to import the dumpfiles generated in parallel.

Datapump in parallel scripts

#!/bin/bash
#############################################################################
#  EXPORT IN PARALLEL WITH DATAPUMP
#############################################################################
#
# Description: This script will create multiple export job in parallel
#              to export table
#
#############################################################################
#
# Author: Cyrille MODIANO
#
# Version : 1.0 (2016-10-25)
# Version : 1.1 (2016-11-30) - Fixed mod = 0 values skipped during export
#
#############################################################################

BASE_SCHEMA=$1
BASE_TABLE=$2
PARALLEL=$3;
PARTITION=$4

function usage(){
  echo "USAGE: 
	    Parameter 1 is the SCHEMA
	    Parameter 2 is the TABLE NAME
	    Parameter 3 is the DEGREE of parallelism
	    Parameter 4 (optional) is the partition (if any)"
}

if [ $# -lt 3 ]; then
	usage
	exit 1
fi

if [ $# -eq 4 ]; then
	PARFILE=${BASE_SCHEMA}_${BASE_TABLE}_${PARTITION}.par
	echo "tables=${BASE_SCHEMA}.${BASE_TABLE}:${PARTITION}" >  $PARFILE
	START_MESSAGE="Beginning export of partition : ${BASE_SCHEMA}.${BASE_TABLE}:${PARTITION} "
	END_MESSAGE "Finished export of partition: ${BASE_SCHEMA}.${BASE_TABLE}:${PARTITION}"
	DUMPFILE_BASE=${BASE_SCHEMA}_${BASE_TABLE}_${PARTITION}
    LOGFILE_BASE=${BASE_SCHEMA}_${BASE_TABLE}_${PARTITION}
else
	PARFILE=${BASE_SCHEMA}_${BASE_TABLE}.par
	echo "tables=${BASE_SCHEMA}.${BASE_TABLE}" >  $PARFILE
	START_MESSAGE="# Beginning export of table : ${BASE_SCHEMA}.${BASE_TABLE}"
	END_MESSAGE "# Finished export of table: ${BASE_SCHEMA}.${BASE_TABLE}"
	DUMPFILE_BASE=${BASE_SCHEMA}_${BASE_TABLE}
    LOGFILE_BASE=${BASE_SCHEMA}_${BASE_TABLE}
fi

# Adding parameters to the parfile
echo "directory=DATA_PUMP" >>  $PARFILE
echo "EXCLUDE=STATISTICS" >>  $PARFILE
echo "CLUSTER=N" >>  $PARFILE

echo "#########################################################################################"
echo $START_MESSAGE
echo "#########################################################################################"
echo " "

LIMIT=$(expr $PARALLEL - 1)

START_TIME=`date`

for i in `seq 0 $LIMIT`
do
   QUERY="where mod(dbms_rowid.rowid_block_number(rowid), ${PARALLEL}) = $i"
   expdp userid=\'/ as sysdba\' query=$BASE_SCHEMA.$BASE_TABLE:\"$QUERY\" dumpfile=${DUMPFILE_BASE}_${i}.dmp logfile=${LOGFILE_BASE}_${i}.log parfile=$PARFILE &
   sleep 3
done

wait `pidof expdp`

echo "#########################################################################################"
echo $END_MESSAGE
echo "# Start time : $START_TIME "
echo "# End time is: `date`"
echo "#########################################################################################"

and the import script:

#!/bin/bash
#############################################################################
#  IMPORT IN PARALLEL WITH DATAPUMP
#############################################################################
#
# Description: This script will import multiple dumpfiles in parallel
#
#############################################################################
#
# Author: Cyrille MODIANO
#
# Version : 1.0 (2016-10-25)
# Version : 1.1 (2016-12-12) - added support for working directories and notifications
#
#############################################################################
export ORAENV_ASK=NO
export ORACLE_SID=$1
. oraenv

TABLE_NAME=$2
PARTITION=$3

function usage(){
  echo "USAGE: 
	    Parameter 1 is the SID of the database where you want to import
		Parameter 2 is the TABLE you want to import
		Parameter 3 (optional) is the PARTITION name you want to import (if any)"
}

if [ $# -lt 2 ]; then
	usage
	exit 1
fi

if [ $# -eq 3 ]; then
	PARFILE=${TABLE_NAME}_${PARTITION}.par
	START_MESSAGE="Beginning import of partition : ${TABLE_NAME}:${PARTITION} "
	END_MESSAGE "Finished import of partition: ${TABLE_NAME}:${PARTITION}"
	SEARCH_PATTERN=${BASE_TABLE}_${PARTITION}
	SUCCESS_MESSAGE="partition: ${TABLE_NAME}:${PARTITION} successfully imported, started at"
	ERROR_MESSAGE="partition: ${TABLE_NAME}:${PARTITION} failed to import, check logfile for more info"
	MAIL_OBJECT="Successfully imported partition ${TABLE_NAME}:${PARTITION}"
else
	PARFILE=${TABLE_NAME}.par
	START_MESSAGE="Beginning import of table : ${TABLE_NAME}"
	END_MESSAGE "Finished import of table : ${TABLE_NAME}"
	SEARCH_PATTERN=${BASE_TABLE}
	SUCCESS_MESSAGE="Table ${TABLE_NAME} successfully imported, started at "
	ERROR_MESSAGE="Table ${TABLE_NAME} failed to import, check logfile for more info"
	MAIL_OBJECT="Successfully imported table ${TABLE_NAME}"
fi

#directories
BASEDIR=/u10/
DUMPDIR=$BASEDIR/DUMP
PARFILEDIR=$BASEDIR/parfiles

mkdir -p $PARFILEDIR

# building the parfile
echo "DIRECTORY=MY_DUMP_DIR" >  ${PARFILEDIR}/$PARFILE
echo "CLUSTER=N" >>  ${PARFILEDIR}/$PARFILE
echo "TABLE_EXISTS_ACTION=APPEND" >>  ${PARFILEDIR}/$PARFILE
echo "DATA_OPTIONS=DISABLE_APPEND_HINT" >>  ${PARFILEDIR}/$PARFILE


echo "#########################################################################################"
echo $START_MESSAGE
echo "#########################################################################################"
echo " "

START_TIME=`date`

for dump in `ls ${DUMPDIR}/*${SEARCH_PATTERN}*.dmp`
do
   DUMPFILE=${dump}
   LOGFILE=imp_${dump}.log
   impdp userid=\'/ as sysdba\' dumpfile=$DUMPFILE logfile=${LOGFILE} parfile=${PARFILEDIR}/$PARFILE &
   sleep 3
done

wait `pidof impdp`

echo "#########################################################################################"
echo $END_MESSAGE
echo "# Start time : $START_TIME "
echo "# End time   : `date`"
echo "#########################################################################################"

# Verifying errors
errors_count=`grep ORA- *${SEARCH_PATTERN}*.log | wc -l`

if [ $errors_count -eq 0 ]; then
        echo "$SUCCESS_MESSAGE $START_TIME and finished at `date`" |  mail -s $MAIL_OBJECT you@your-email.com
else
        echo $ERROR_MESSAGE |  mail -s $MAIL_OBJECT you@your-email.com
fi

This solution is a workaround for the issue mentioned previously but it can also be used in standard edition to use parallelism with datapump which is only accessible to EE users otherwise.
To use these scripts you will have to change the DIRECTORY to match the one you are using.
You can also customize the parfile options, on my side I used the following:

  • TRANSFORM=LOB_STORAGE:SECUREFILE (to transform BASICFILES to SECUREFILES while importing)
  • TABLE_EXISTS_ACTION=APPEND (because I was importing partitions and the table existed already)
  • DATA_OPTIONS=DISABLE_APPEND_HINT (this is mandatory when using multiple jobs importing in the same table at the same time as mentioned in the oracle note)

Thanks for reading.

12 thoughts on “Datapump in parallel with a shell script

  1. Hi, how to do the connection string in a container/pluggable database. Assume the plugggable dbs is called mypdb.
    The connect sring “/ as sysdba”@mypdb : this syntac does not work.
    The connect sring “/@mypdb as sysdba” : this syntac does not work.
    It will work only if I put explicitly a user not the “/”
    The connect sring “system/password@mypdb as sysdba” : this syntac will work.
    But I don’t want to use neither sys nor system nor any other user. I need to use “/ as sysdba”, how to make work with “pluggable names string”
    Thanks in advance

  2. Cyrille great job…can you take a schema backup in parallel mode?And if yes what is the command

    I have tried :
    Script_name.sh schema_name 5
    Not working :)… am on SE

    1. Hi,

      This script was made to export a table or a table partition but it’s possible to adapt it to export a whole schema, you will to add a step to loop through the different tables.

      Thanks

  3. Your code is too sophisticated for me too add a single line.. I will be most greatful if you could change the code…I would be really really appreciative

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.