Sqoop – Load Data to HDFS from Oracle

Overview:

Apache Sqoop is a command line tool to load Data from RDBMS Databases to Hadoop and vice-versa. The Name Sqoop originated from Hadoop and SQL by taking SQ (from SQL) and OOP (from HadOOP). Sqoop uses Map Reduce to import and export data, which provides parallel execution of the jobs.

Now, we will see how to Setup Sqoop on a Hadoop Distributed File System and load data to HDFS from Oracle.

Download the software:

The software for Apache Sqoop can be downloaded from http://www.apache.org/dyn/closer.lua/sqoop/

Prerequisites:

  1. Java should be in place
  2. Hadoop must have been already setup, up and running

In this current exercise, I’ve RHEL 7.2 as the OS, used Java 1.7_80, Hadoop 2.72

So, the select the Sqoop software for Apache Hadoop 2.7*. Below is the screen shot of the file,

sqoop_hdp

Once the software is downloaded and scp’d to the server. Perform the installation steps as below,

Setup Sqoop:

Uncompress the Sqoop S/w,

[hadoop@node1 hadoop]$ gunzip sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz

[hadoop@node1 hadoop]$ tar -xvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar

Rename the unzipped Sqoop Directory to something simple to Navigate.

[hadoop@node1 hadoop]$ mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha/ sqoop-1.4.6

Set the Path for SQOOP_HOME and add the SQOOP executable to PATH

export SQOOP_HOME=/mnt/oracle/hadoop/sqoop-1.4.6

export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$SQOOP_HOME/bin

Make sure you set these paths in .bash_profile and validate the path as below,

[hadoop@node1 conf]$ echo  $SQOOP_HOME

/mnt/oracle/hadoop/sqoop-1.4.6

[hadoop@node1 conf]$ which sqoop

/mnt/oracle/hadoop/sqoop-1.4.6/bin/sqoop

Navigate to SQQOP_HOME/conf directory and add the HADOOP_COMMON_HOME and HADOOP_MAPRED_HOME to the configuration file of SQOOP

[hadoop@node1 ~]$ cd $SQOOP_HOME/conf

[hadoop@node1 conf]$ cp -pr sqoop-env-template.sh sqoop-env.sh

[hadoop@node1 conf]$ vi sqoop-env.sh

sqhdp

OJDBC Driver file:

As SQOOP is mainly meant for exporting/importing data from RDBMS to Hadoop, we need to download the ojdbc.jar file (in this exercise we are only discussing importing Data from Oracle DB) and place it in $SQOOP_HOME/lib

Download the ojdbc.jar file from the link “http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-112010-090769.html

[hadoop@node1 lib]$ cp -pr /mnt/oracle/hadoop/software/ojdbc6.jar .

[hadoop@node1 lib]$ pwd

/mnt/oracle/hadoop/sqoop-1.4.6/lib

You could issue the below command to check the Sqoop version and path is all set,

[hadoop@node1 bin]$ sqoop-version

17/03/08 00:35:28 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6

Sqoop 1.4.6

Loading Data to HDFS from Oracle:

Create a directory in HDFS:

hadoop fs -mkdir -p /mnt/oracle/hadoop/hadoopdata/testdata

Import Data to hdfs using target dir:

sqoop import –connect jdbc:oracle:thin:@oracledb:1521/gbid –username scott –password tiger –table LOC_CURR_CODE -m 1 –target-dir /mnt/oracle/hadoop/hadoopdata/testdata/LOC_CURR_CODE

This starts a Map Job and finally gives the o/p as below,

Bytes Written=1151

17/03/08 12:44:59 INFO mapreduce.ImportJobBase: Transferred 1.124 KB in 19.0649 seconds (60.3726 bytes/sec)

17/03/08 12:44:59 INFO mapreduce.ImportJobBase: Retrieved 103 records.

To read the loaded data from HDFS, you may issue the below command,

Read Data from HDFS:

hadoop fs -cat ///mnt/oracle/hadoop/hadoopdata/testdata/LOC_CURR_CODE/part*

This completes loading data to HDFS from Oracle Database using Sqoop.

 

Advertisements
This entry was posted in Sqoop. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s