Hadoop 2.7.2 Pseudo Distributed Install on RHEL 7.2

The install of Hadoop is very simple if we split the steps into the main categories.

Pre-Install Steps:

  1. User/Group Creation for Hadoop Install
  2. Password less SSH between the server(s)
  3. Download the Hadoop & Java Software

Install Steps:

  1. Set the Path in the env file/.bash_profile
  2. Update the Configuration files with respective details
  3. Format the HDFS
  4. Start the services

Post-Install Steps:

  1. Navigate thru the urls to verify the health of environment
  2. Run a sample MR (Map Reduce) job to validate the setup

Lets go one step after the other.

Pre-Install Steps:

User/Group Creation:

In linux as root user, create the user and group and assign the user to the group created

# useradd hadoop

# groupadd hinstall

# usermod -a -g hinstall hadoop

Setup a password for the user as per screenshot below,


Setup Password less SSH:

[hadoop@node1 ~]$ ssh-keygen -t rsa

[hadoop@node1 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

[hadoop@node1 ~]$ chmod 644 ~/.ssh/authorized_keys

Download the Hadoop & Java Software:

Launch the url “http://www.oracle.com/technetwork/java/javase/downloads/index.html” for downloading the Latest Java

Download Hadoop from http://www.apache.org/dyn/closer.cgi/hadoop/common/ –>  http://apache.claz.org/hadoop/common/

Once the download is completed, ship the same to the server on which Hadoop setup is planned.

This completes the Pre-Install steps.


Install Steps:

Unzip the software:

cd /mnt/oracle/hadoop

[hadoop@node1 hadoop]$ gunzip hadoop-2.7.2.tar.gz

[hadoop@node1 hadoop]$ tar -xvf hadoop-2.7.2.tar

Set the Path:

export JAVA_HOME=/usr/java/jdk1.7.0_80

export PATH=$PATH:$JAVA_HOME/bin


Make sure to add the above paths in “.bash_profile” of Hadoop user and test the below,

[hadoop@node1 ~]$ echo $JAVA_HOME


Add the below paths in “.bash_profile” of hadoop user,

export HADOOP_HOME= /mnt/oracle/hadoop/hadoop-2.7.2








Cross verify the same by executing the below,

[hadoop@node1 ~]$ echo $HADOOP_HOME



Configuration Files Updates:

The Location of Configuration files is $HADOOP_HOME/etc/hadoop


Set the paths in different configuration files as below,


This file contains the details of JAVA_HOME



This file contains the details of below,

–> Configuration port for Hadoop Instance

–> Memory that needs to be allocated for Hadoop Instance

–> Size of the Filesystem that can be allocated

–> Size of Buffers (read and write)

[hadoop@node1 hadoop]$ cp -pr core-site.xml core-site.xml_orig

[hadoop@node1 hadoop]$ vi core-site.xml


<name>fs.default.name </name>
<value> hdfs://localhost:9000 </value>



This file contains the details of Datanode, namenode and the replicated data.

[hadoop@node1 hadoop]$ cp -pr hdfs-site.xml hdfs-site.xml_orig

[hadoop@node1 hadoop]$ vi hdfs-site.xml



[hadoop@node1 hadoop]$ cp -pr yarn-site.xml yarn-site.xml_orig

[hadoop@node1 hadoop]$ vi yarn-site.xml



[hadoop@node1 hadoop]$ cp -pr mapred-site.xml.template mapred-site.xml

[hadoop@node1 hadoop]$ vi mapred-site.xml


With this, the configuration changes in the files is completed.


Format HDFS:

Just to check whether the path is set right or not, give the below command,[hadoop@node1 hadoop]$ which hdfs


Below is the command to format the namenode,

[hadoop@node1 hadoop]$ hdfs namenode -format

Once the above command is given, the o/p follows,

17/03/07 09:58:34 INFO namenode.NameNode: STARTUP_MSG:


STARTUP_MSG: Starting NameNode

STARTUP_MSG:   host = node1.diebold.com/

STARTUP_MSG:   args = [-format]

STARTUP_MSG:   version = 2.7.2

STARTUP_MSG:   classpath = /mnt/oracle/hadoop/hadoop-2.7.2/etc/hadoop:/mnt/oracle/hadoop/hadoop-2.7.2/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/mnt/oracle/hadoop/hadoop-2.7.2/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/mnt/oracle/hadoop/hadoop-2.7.2/share/hadoop/common/lib/commons-configuration



17/03/07 09:58:36 INFO namenode.NameNode: SHUTDOWN_MSG:


SHUTDOWN_MSG: Shutting down NameNode at node1.diebold.com/


Once the Formatting is done, you may go ahead and start the services,

Start Services:

Just to check the details of configuration,

$ hdfs getconf –namenodes


Start services of HDFS and YARN:

[hadoop@node1 hadoop]$ start-dfs.sh

Starting namenodes on [node1]

node1: starting namenode, logging to /mnt/oracle/hadoop/hadoop-2.7.2/logs/hadoop-hadoop-namenode-node1.out

localhost: starting datanode, logging to /mnt/oracle/hadoop/hadoop-2.7.2/logs/hadoop-hadoop-datanode-node1.out

Starting secondary namenodes [] starting secondarynamenode, logging to /mnt/oracle/hadoop/hadoop-2.7.2/logs/hadoop-hadoop-secondarynamenode-node1.out

Once the startup prompt comes out, you can give “jps” to check the Java Services running on it.

[hadoop@node1 hadoop]$ jps

22418 DataNode

22648 SecondaryNameNode

22251 NameNode

22842 Jps

Start Yarn Services:

[hadoop@node1 hadoop]$ start-yarn.sh

starting yarn daemons

starting resourcemanager, logging to /mnt/oracle/hadoop/hadoop-2.7.2/logs/yarn-hadoop-resourcemanager-node1.out

localhost: starting nodemanager, logging to /mnt/oracle/hadoop/hadoop-2.7.2/logs/yarn-hadoop-nodemanager-node1.out

Issue the “jps” again to check if the yarn services is also started,

[hadoop@node1 hadoop]$ jps

23090 NodeManager

22418 DataNode

23497 Jps

22949 ResourceManager

22648 SecondaryNameNode

22251 NameNode


This completes the start of services.

Post-Install Steps:

Navigate thru the urls:

Launch the URL http://node1:50070


To get the cluster related information, you can launch http://node1:8088



For Node Manager, the url is http://node1:8042



Run a Simple Job to Validate the Setup:

Navigate to $HADOOP_HOME/share/hadoop/mapreduce


Make sure to check if the path is still correct,

[hadoop@node1 mapreduce]$ which yarn


Invoke a sample job using the command below,

[hadoop@node1 mapreduce]$ yarn jar hadoop-mapreduce-examples-2.7.2.jar pi 16 1000



Number of Maps  = 16

Samples per Map = 1000

17/03/07 23:16:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

Wrote input for Map #0

Wrote input for Map #1

Wrote input for Map #2

Starting Job

17/03/07 23:16:45 INFO client.RMProxy: Connecting to ResourceManager at /

17/03/07 23:16:46 INFO input.FileInputFormat: Total input paths to process : 16

17/03/07 23:16:46 INFO mapreduce.JobSubmitter: number of splits:16

17/03/07 23:16:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1488945045915_0001

17/03/07 23:16:47 INFO impl.YarnClientImpl: Submitted application application_1488945045915_0001

17/03/07 23:16:47 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1488945045915_0001/

17/03/07 23:16:47 INFO mapreduce.Job: Running job: job_1488945045915_0001

Job Finished in 28.976 seconds

Estimated value of Pi is 3.14250000000000000000


This tests a sample MR job testing. To check the status of job execution from the front end,

launch http://node1:8088


This completes the Install, Configuration and testing of Hadoop 2.7.2 Pseudo Distributed setup on RHEL 7.2.

Thank you.



This entry was posted in Hadoop. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s