Spark Setup on RHEL 7.2 and Apache Hadoop 2.7

Spark has multiple APIs (Spark Scala, Spark Java, Spark Python, Spark R) on which it can be setup. Lets see Spark setup on Scala API.

Download the SPARK from the link


Un-compress the downloaded file

[hadoop@node1 hadoop]$ tar -xvf spark-2.1.0-bin-hadoop2.7.tgz

Rename the file to a short name,

[hadoop@node1 hadoop]$ mv spark-2.1.0-bin-hadoop2.7 spark-2.1.0

Set the path in .bash_profile,

Vi .bash_profile

export SPARK_HOME=/mnt/oracle/hadoop/spark-2.1.0



Validate the same by querying the version of spark and scala,

[hadoop@node1 ~]$ echo $SPARK_HOME


[hadoop@node1 ~]$ which spark-shell


[hadoop@node1 sbin]$ spark-shell

Using Spark’s default log4j profile: org/apache/spark/

Setting default log level to “WARN”.

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


Spark context available as ‘sc’ (master = local[*], app id = local-1489331022269).

Spark session available as ‘spark’.

Welcome to

____              __

/ __/__  ___ _____/ /__

_\ \/ _ \/ _ `/ __/  ‘_/

/___/ .__/\_,_/_/ /_/\_\   version 2.1.0



Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80)

Type in expressions to have them evaluated.

Type :help for more information.

scala> sc.version

res0: String = 2.1.0

Start the Spark Services:

Navigate to $SPARK_HOME/sbin,




view /mnt/oracle/hadoop/spark-2.1.0/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-node1.out



This starts the Master and Slave in Spark.

Submit a Sample program to validate the environment. This program is to check the value of ‘pi’ (22/7), for which memory allocated for this job is 1G, and cores 1.

[hadoop@node1 jars]$ spark-submit –class org.apache.spark.examples.SparkPi –master spark://node1:7077 –executor-memory 1G –total-executor-cores 1 /mnt/oracle/hadoop/spark-2.1.0/examples/jars/spark-examples_2.11-2.1.0.jar 10


Using Spark’s default log4j profile: org/apache/spark/

17/03/12 11:20:41 INFO SparkContext: Running Spark version 2.1.0

17/03/12 11:20:43 INFO Utils: Successfully started service ‘sparkDriver’ on port 59659.

17/03/12 11:20:43 INFO SparkEnv: Registering MapOutputTracker

17/03/12 11:20:43 INFO SparkEnv: Registering BlockManagerMaster

17/03/12 11:20:43 INFO BlockManagerMasterEndpoint: Using for getting topology information

17/03/12 11:20:43 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up

17/03/12 11:20:43 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-7fbe65f7-b525-4ef3-b85b-ded22d85e476

17/03/12 11:20:43 INFO MemoryStore: MemoryStore started with capacity 408.9 MB

17/03/12 11:20:43 INFO SparkEnv: Registering OutputCommitCoordinator

17/03/12 11:20:43 WARN Utils: Service ‘SparkUI’ could not bind on port 4040. Attempting port 4041.

17/03/12 11:20:43 INFO Utils: Successfully started service ‘SparkUI’ on port 4041.

17/03/12 11:20:43 INFO SparkUI: Bound SparkUI to, and started at

17/03/12 11:20:44 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://node1:7077…

17/03/12 11:20:44 INFO Utils: Successfully started service ‘’ on port 39583.

17/03/12 11:20:44 INFO BlockManagerMasterEndpoint: Registering block manager with 408.9 MB RAM, BlockManagerId(driver,, 39583, None)

17/03/12 11:20:44 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver,, 39583, None)

17/03/12 11:20:44 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20170312112044-0000/0 is now RUNNING

17/03/12 11:20:45 INFO SparkContext: Starting job: reduce at SparkPi.scala:38

17/03/12 11:20:45 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 10 output partitions

17/03/12 11:20:45 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)

17/03/12 11:20:46 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1832.0 B, free 408.9 MB)

17/03/12 11:20:47 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on (size: 1172.0 B, free: 366.3 MB)

17/03/12 11:20:48 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1,, executor 0, partition 1, PROCESS_LOCAL, 6027 bytes)

17/03/12 11:20:48 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 2.819464 s

Pi is roughly 3.1436991436991435

17/03/12 11:20:48 INFO SparkUI: Stopped Spark web UI at

17/03/12 11:20:48 INFO StandaloneSchedulerBackend: Shutting down all executors

17/03/12 11:20:48 INFO MemoryStore: MemoryStore cleared

17/03/12 11:20:48 INFO BlockManager: BlockManager stopped

17/03/12 11:20:48 INFO BlockManagerMaster: BlockManagerMaster stopped

17/03/12 11:20:48 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!

17/03/12 11:20:48 INFO SparkContext: Successfully stopped SparkContext

17/03/12 11:20:48 INFO ShutdownHookManager: Shutdown hook called

17/03/12 11:20:48 INFO ShutdownHookManager: Deleting directory /tmp/spark-fd0c2466-1b34-4f39-9adc-c3d958cbc393

So, the value if pi is 3.143699… This completes the setup of Spark and testing the environment by running a sample program.

This entry was posted in Spark. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s