Saturday, 20 March 2021

Setup of Spark Scala Program for WordCount in Windows10

Install Spark:

Download latest version of Spark from: https://www.apache.org/dyn/closer.lua/spark/spark-3.1.1/spark-3.1.1-bin-hadoop3.2.tgz

Unzip the same to a directory(Eg: C:/Program Files/)

Install Scala:

Spark 3.1.1 is compatible with Scala 3.12.10

Download the Windows binaries from https://www.scala-lang.org/download/2.12.10.html

Set up of Environment Variables:

SPARK_HOME: c:\progra~1\spark\spark-3.1.1-bin-hadoop3.2

SCALA_HOME: c:\progra~1\spark\spark-3.1.1-bin-hadoop3.2

Path: %Path%;%SCALA_HOME%\bin;%SPARK_HOME%\bin;

Download the sample code from: https://github.com/anjuprasannan/WordCountExampleSpark

Configuring Scala project in IntelliJ: https://docs.scala-lang.org/getting-started/intellij-track/getting-started-with-scala-in-intellij.html

Add Maven support by following the steps at: https://www.jetbrains.com/help/idea/convert-a-regular-project-into-a-maven-project.html

Modify the pom.xml file as per the Git repository.

Build the project as: mvn clean install

Edit the input and output directories in https://github.com/anjuprasannan/WordCountExampleSpark/blob/main/src/main/scala/WordCount.scala [Note that the output location should be a non existant directory.]

Execute the Application as Right click WordCount -> Run 'WordCount'

You can see the output directory created with the result.



"A life spent making mistakes is not only more honorable, but more useful than a life spent doing nothing."


Wednesday, 17 March 2021

Setup of Map Reduce Program for WordCount in Windows10

Java Download: https://www.oracle.com/in/java/technologies/javase/javase-jdk8-downloads.html

Maven Download: https://maven.apache.org/download.cgi

Maven installation: https://maven.apache.org/install.html

Eclipse download and installation: https://www.eclipse.org/downloads/download.php?file=/technology/epp/downloads/release/2020-12/R/eclipse-java-2020-12-R-win32-x86_64.zip

Hadoop Download:https://hadoop.apache.org/release/3.2.1.html

winutils: https://github.com/cdarlint/winutils/blob/master/hadoop-3.2.1/bin/winutils.exe. Download and copy to bin folder under hadoop

System Variables setup:

JAVA_HOME: C:\Program Files\Java\jdk1.8.0_281

HADOOP_HOME: C:\Program Files\hadoop\hadoop-3.2.1

Path: %PATH%;%JAVA_HOME%\bin;C:\Program Files\apache-maven-3.6.3\bin;%HADOOP_HOME%\sbin;

Map Reduce Code Setup:

Map Reduce Code: https://github.com/anjuprasannan/MapReduceExample

checkout the code from Git as:

git clone https://github.com/anjuprasannan/MapReduceExample.git

Import the project to Eclipse

Edit the input, output and hadoop home directory locations in "/MapReduceExample/src/com/anjus/mapreduceexample/WordCount.java" [Note that the output location should be a non existant directory.]

Build the project:

Right click project -> Maven Clean

Right click project -> Maven Install

Job Execution:

Right Click on "/MapReduceExample/src/com/anjus/mapreduceexample/WordCount.java" -> Run As -> Java Application



"Don't be afraid to give up the good to go for the great."