Spark Setup on RHEL 7.2 and Apache Hadoop 2.7

Spark has multiple APIs (Spark Scala, Spark Java, Spark Python, Spark R) on which it can be setup. Lets see Spark setup on Scala API.

Download the SPARK from the link http://spark.apache.org/downloads.html

spark1

Un-compress the downloaded file

[hadoop@node1 hadoop]$ tar -xvf spark-2.1.0-bin-hadoop2.7.tgz

Rename the file to a short name,

[hadoop@node1 hadoop]$ mv spark-2.1.0-bin-hadoop2.7 spark-2.1.0

Set the path in .bash_profile,

Vi .bash_profile

export SPARK_HOME=/mnt/oracle/hadoop/spark-2.1.0

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$SQOOP_HOME/bin:$SPARK_HOME/bin:$PATH

Validate the same by querying the version of spark and scala,

[hadoop@node1 ~]$ echo $SPARK_HOME

/mnt/oracle/hadoop/spark-2.1.0

[hadoop@node1 ~]$ which spark-shell

/mnt/oracle/hadoop/spark-2.1.0/bin/spark-shell

[hadoop@node1 sbin]$ spark-shell

Using Spark’s default log4j profile: org/apache/spark/log4j-defaults.properties

Setting default log level to “WARN”.

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

 

Spark context available as ‘sc’ (master = local[*], app id = local-1489331022269).

Spark session available as ‘spark’.

Welcome to

____              __

/ __/__  ___ _____/ /__

_\ \/ _ \/ _ `/ __/  ‘_/

/___/ .__/\_,_/_/ /_/\_\   version 2.1.0

/_/

 

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80)

Type in expressions to have them evaluated.

Type :help for more information.

scala> sc.version

res0: String = 2.1.0

Start the Spark Services:

Navigate to $SPARK_HOME/sbin,

spark4

invoke start-all.sh

spark2

view /mnt/oracle/hadoop/spark-2.1.0/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-node1.out

 

spark3

This starts the Master and Slave in Spark.

Submit a Sample program to validate the environment. This program is to check the value of ‘pi’ (22/7), for which memory allocated for this job is 1G, and cores 1.

[hadoop@node1 jars]$ spark-submit –class org.apache.spark.examples.SparkPi –master spark://node1:7077 –executor-memory 1G –total-executor-cores 1 /mnt/oracle/hadoop/spark-2.1.0/examples/jars/spark-examples_2.11-2.1.0.jar 10

Output:

Using Spark’s default log4j profile: org/apache/spark/log4j-defaults.properties

17/03/12 11:20:41 INFO SparkContext: Running Spark version 2.1.0

17/03/12 11:20:43 INFO Utils: Successfully started service ‘sparkDriver’ on port 59659.

17/03/12 11:20:43 INFO SparkEnv: Registering MapOutputTracker

17/03/12 11:20:43 INFO SparkEnv: Registering BlockManagerMaster

17/03/12 11:20:43 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information

17/03/12 11:20:43 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up

17/03/12 11:20:43 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-7fbe65f7-b525-4ef3-b85b-ded22d85e476

17/03/12 11:20:43 INFO MemoryStore: MemoryStore started with capacity 408.9 MB

17/03/12 11:20:43 INFO SparkEnv: Registering OutputCommitCoordinator

17/03/12 11:20:43 WARN Utils: Service ‘SparkUI’ could not bind on port 4040. Attempting port 4041.

17/03/12 11:20:43 INFO Utils: Successfully started service ‘SparkUI’ on port 4041.

17/03/12 11:20:43 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.9.81.180:4041

17/03/12 11:20:44 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://node1:7077…

17/03/12 11:20:44 INFO Utils: Successfully started service ‘org.apache.spark.network.netty.NettyBlockTransferService’ on port 39583.

17/03/12 11:20:44 INFO BlockManagerMasterEndpoint: Registering block manager 10.9.81.180:39583 with 408.9 MB RAM, BlockManagerId(driver, 10.9.81.180, 39583, None)

17/03/12 11:20:44 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.9.81.180, 39583, None)

17/03/12 11:20:44 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20170312112044-0000/0 is now RUNNING

17/03/12 11:20:45 INFO SparkContext: Starting job: reduce at SparkPi.scala:38

17/03/12 11:20:45 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 10 output partitions

17/03/12 11:20:45 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)

17/03/12 11:20:46 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1832.0 B, free 408.9 MB)

17/03/12 11:20:47 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.9.81.180:33660 (size: 1172.0 B, free: 366.3 MB)

17/03/12 11:20:48 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 10.9.81.180, executor 0, partition 1, PROCESS_LOCAL, 6027 bytes)

17/03/12 11:20:48 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 2.819464 s

Pi is roughly 3.1436991436991435

17/03/12 11:20:48 INFO SparkUI: Stopped Spark web UI at http://10.9.81.180:4041

17/03/12 11:20:48 INFO StandaloneSchedulerBackend: Shutting down all executors

17/03/12 11:20:48 INFO MemoryStore: MemoryStore cleared

17/03/12 11:20:48 INFO BlockManager: BlockManager stopped

17/03/12 11:20:48 INFO BlockManagerMaster: BlockManagerMaster stopped

17/03/12 11:20:48 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!

17/03/12 11:20:48 INFO SparkContext: Successfully stopped SparkContext

17/03/12 11:20:48 INFO ShutdownHookManager: Shutdown hook called

17/03/12 11:20:48 INFO ShutdownHookManager: Deleting directory /tmp/spark-fd0c2466-1b34-4f39-9adc-c3d958cbc393

So, the value if pi is 3.143699… This completes the setup of Spark and testing the environment by running a sample program.

Advertisements
This entry was posted in Spark. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s