Saturday 20 March 2021

Setup of Spark Scala Program for WordCount in Windows10

Install Spark:

Download latest version of Spark from: https://www.apache.org/dyn/closer.lua/spark/spark-3.1.1/spark-3.1.1-bin-hadoop3.2.tgz

Unzip the same to a directory(Eg: C:/Program Files/)

Install Scala:

Spark 3.1.1 is compatible with Scala 3.12.10

Download the Windows binaries from https://www.scala-lang.org/download/2.12.10.html

Set up of Environment Variables:

SPARK_HOME: c:\progra~1\spark\spark-3.1.1-bin-hadoop3.2

SCALA_HOME: c:\progra~1\spark\spark-3.1.1-bin-hadoop3.2

Path: %Path%;%SCALA_HOME%\bin;%SPARK_HOME%\bin;

Download the sample code from: https://github.com/anjuprasannan/WordCountExampleSpark

Configuring Scala project in IntelliJ: https://docs.scala-lang.org/getting-started/intellij-track/getting-started-with-scala-in-intellij.html

Add Maven support by following the steps at: https://www.jetbrains.com/help/idea/convert-a-regular-project-into-a-maven-project.html

Modify the pom.xml file as per the Git repository.

Build the project as: mvn clean install

Edit the input and output directories in https://github.com/anjuprasannan/WordCountExampleSpark/blob/main/src/main/scala/WordCount.scala [Note that the output location should be a non existant directory.]

Execute the Application as Right click WordCount -> Run 'WordCount'

You can see the output directory created with the result.



"A life spent making mistakes is not only more honorable, but more useful than a life spent doing nothing."


No comments:

Post a Comment