Saturday 20 March 2021

Setup of Spark Scala Program for WordCount in Windows10

Install Spark:

Download latest version of Spark from: https://www.apache.org/dyn/closer.lua/spark/spark-3.1.1/spark-3.1.1-bin-hadoop3.2.tgz

Unzip the same to a directory(Eg: C:/Program Files/)

Install Scala:

Spark 3.1.1 is compatible with Scala 3.12.10

Download the Windows binaries from https://www.scala-lang.org/download/2.12.10.html

Set up of Environment Variables:

SPARK_HOME: c:\progra~1\spark\spark-3.1.1-bin-hadoop3.2

SCALA_HOME: c:\progra~1\spark\spark-3.1.1-bin-hadoop3.2

Path: %Path%;%SCALA_HOME%\bin;%SPARK_HOME%\bin;

Download the sample code from: https://github.com/anjuprasannan/WordCountExampleSpark

Configuring Scala project in IntelliJ: https://docs.scala-lang.org/getting-started/intellij-track/getting-started-with-scala-in-intellij.html

Add Maven support by following the steps at: https://www.jetbrains.com/help/idea/convert-a-regular-project-into-a-maven-project.html

Modify the pom.xml file as per the Git repository.

Build the project as: mvn clean install

Edit the input and output directories in https://github.com/anjuprasannan/WordCountExampleSpark/blob/main/src/main/scala/WordCount.scala [Note that the output location should be a non existant directory.]

Execute the Application as Right click WordCount -> Run 'WordCount'

You can see the output directory created with the result.



"A life spent making mistakes is not only more honorable, but more useful than a life spent doing nothing."


Wednesday 17 March 2021

Setup of Map Reduce Program for WordCount in Windows10

Java Download: https://www.oracle.com/in/java/technologies/javase/javase-jdk8-downloads.html

Maven Download: https://maven.apache.org/download.cgi

Maven installation: https://maven.apache.org/install.html

Eclipse download and installation: https://www.eclipse.org/downloads/download.php?file=/technology/epp/downloads/release/2020-12/R/eclipse-java-2020-12-R-win32-x86_64.zip

Hadoop Download:https://hadoop.apache.org/release/3.2.1.html

winutils: https://github.com/cdarlint/winutils/blob/master/hadoop-3.2.1/bin/winutils.exe. Download and copy to bin folder under hadoop

System Variables setup:

JAVA_HOME: C:\Program Files\Java\jdk1.8.0_281

HADOOP_HOME: C:\Program Files\hadoop\hadoop-3.2.1

Path: %PATH%;%JAVA_HOME%\bin;C:\Program Files\apache-maven-3.6.3\bin;%HADOOP_HOME%\sbin;

Map Reduce Code Setup:

Map Reduce Code: https://github.com/anjuprasannan/MapReduceExample

checkout the code from Git as:

git clone https://github.com/anjuprasannan/MapReduceExample.git

Import the project to Eclipse

Edit the input, output and hadoop home directory locations in "/MapReduceExample/src/com/anjus/mapreduceexample/WordCount.java" [Note that the output location should be a non existant directory.]

Build the project:

Right click project -> Maven Clean

Right click project -> Maven Install

Job Execution:

Right Click on "/MapReduceExample/src/com/anjus/mapreduceexample/WordCount.java" -> Run As -> Java Application



"Don't be afraid to give up the good to go for the great."





Monday 18 January 2021

Particle Swarm Optimization

Artificial intelligence (AI) is the intelligence exhibited by machines. It is defined as “the study and design of intelligent agents”, where an intelligent agent represents a system that perceives its environment and takes action that maximizes its success chance. AI research is highly technical and specialized and is deeply divided into subfields that often fail to communicate with each other. Currently popular approaches of AI include traditional statistical methods, traditional symbolic AI, and computational intelligence (CI). CI is a fairly new research area. It is a set of nature-inspired computational methodologies and approaches to address complex real-world problems to which traditional approaches are ineffective or infeasible. CI includes artificial neural network (ANN), fuzzy logic, and evolutionary computation (EC).

Swarm intelligence (SI) is a part of EC. It researches the collective behavior of decentralized, self-organized systems, natural or artificial. Typical SI systems consist of a population of simple agents or boids interacting locally with one another and with their environment. The inspiration often comes from nature, especially biological systems. The agents in a SI system follow very simple rules. There is no centralized control structure dictating how individual agents should behave. The agents’ real behaviors are local, and to a certain degree random; however, interactions between such agents lead to the emergence of “intelligent” global behavior, which is unknown to the individual agents. Well-known examples of SI include ant colonies, bird flocking, animal herding, bacterial growth, and fish schooling.

Self-organization is a key feature of SI system. It is a process where global order or coordination arises out of the local interactions between the components of an initially disordered system. This process is spontaneous; that is, it is not controlled by any agent inside or outside of the system. The self-organization in swarms are interpreted through three basic ingredients as follows.

(1) Strong dynamical nonlinearity (often involving positive and negative feedback): positive feedback helps promote the creation of convenient structures, while negative feedback counterbalances positive feedback and helps to stabilize the collective pattern.

(2) Balance of exploitation and exploration: SI identifies a suitable balance to provide a valuable mean artificial creativity approach.

(3) Multiple interactions: agents in the swarm use information coming from neighbor agents so that the information spreads throughout the network.

Particle Swarm Optimization (PSO) is a technique used to explore the search space of a given problem to find the settings or parameters required to maximize a particular objective. In computational science, particle swarm optimization (PSO) is a computational method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality. Particle Swarm Optimization (PSO), a population based technique for stochastic search in a multidimensional space, has so far been employed successfully for solving a variety of optimization problems including many multifaceted problems, where other popular methods like steepest descent, gradient descent, conjugate gradient, Newton method, etc. do not give satisfactory results. This technique was first described by James Kennedy and Russell C. Eberhart in 1995. The algorithm is inspired by social behavior of bird flocking and fish schooling. Suppose a group of birds is searching for food in an area and only one piece of food is available. Birds do not have any knowledge about the location of the food. But they know how far the food is from their present location. So the best strategy to locate the food is to follow the bird nearest to the food.

A flying bird has a position and a velocity at any time t. In search of food, the bird changes its position by adjusting the velocity. The velocity changes based on its past experience and also the feedbacks received from neighbouring birds. This searching process can be artificially simulated for solving non-linear optimization problem. So this is a population based stochastic optimization technique inspired by social behaviour of bird flocking or fish schooling. In this algorithm each solution is considered as bird, called particle. All the particles have a fitness value. The fitness values can be calculated using objective function. All the particles preserve their individual best performance and they also know the best performance of their group. They adjust their velocity considering their best performance and also considering the best performance of the best particle.

The usual aim of the particle swarm optimization (PSO) algorithm is to solve an unconstrained minimization problem: find x* such that f(x*)<=f(x) for all d-dimensional real vectors x. The objective function f: Rd-> R is called the fitness function. Each particle i in Swarm Topology has its neighborhood Ni (a subset of P). The structure of the neighborhoods is called the swarm topology, which can be represented by a graph. Usual topologies are: fully connected topology and circle topology.

The initial state of a four-particle PSO algorithm seeking the global maximum in a one-dimensional search space. The search space is composed of all the possible solutions. The PSO algorithm has no knowledge of the underlying objective function, and thus has no way of knowing if any of the candidate solutions are near to or far away from a local or global maximum. The PSO algorithm simply uses the objective function to evaluate its candidate solutions, and operates upon the resultant fitness values. Each particle maintains its position, composed of the candidate solution and its evaluated fitness, and its velocity. Additionally, it remembers the best fitness value it has achieved thus far during the operation of the algorithm, referred to as the individual best fitness, and the candidate solution that achieved this fitness, referred to as the individual best position or individual best candidate solution. Finally, the PSO algorithm maintains the best fitness value achieved among all particles in the swarm, called the global best fitness, and the  candidate solution that achieved this fitness, called the global best position or global best candidate solution.

The PSO algorithm consists of just three steps, which are repeated until a stopping condition is met:

1. Evaluate the fitness of each particle

2. Update individual and global best fitnesses and positions

3. Update velocity and position of each particle

The first two steps are fairly trivial. Fitness evaluation is conducted by supplying the candidate solution to the objective function. Individual and global best fitnesses and positions are updated by comparing the newly evaluated fitnesses against the previous individual and global best fitnesses, and replacing the best fitnesses and positions as necessary. The velocity and position update step is responsible for the optimization ability of the PSO algorithm. Once the velocity for each particle is calculated, each particle’s position is updated by applying the new velocity to the particle’s previous position. This process is repeated until some stopping condition is met. Some common stopping conditions include: a preset number of iterations of the PSO algorithm, a number of iterations since the last update of the global best candidate solution, or a predefined target fitness value.

The same can be described in detailed steps as follows:

Step: 1 ==> Initialize particles

Step: 2 ==> Evaluate fitness of each particles

Step: 3 ==> Modify velocities based on previous best and global best positions

Step: 4 ==> Terminate criteria, if criteria is not satisfied then go to Step: 2.

Step: 5 ==> Stop

Unlike Genetic Algorithms(GA), PSOs do not change the population from generation to generation, but keep the same population, iteratively updating the positions of the members of the population (i.e., particles). PSOs have no operators of “mutation”, “recombination”, and no notion of the “survival of the fittest”. On the other hand, similarly to GAs, an important element of PSOs is that the members of the population “interact”, or “influence” each other.

PSO has several advantages, including fast convergence, few setting parameters, and simple and easy implementation; hence, it can be used to solve nonlinear, non differentiable, and multipeak optimization problems, particularly in science and engineering fields. As a powerful optimization technique, PSO has been extensively applied in different geotechnical engineering aspects such as slope stability analysis, pile and foundation engineering, rock and soil mechanics, and tunneling and underground space design. The fitness function can be non-differentiable (only values of the fitness function are used). The method can be applied to optimization problems of large dimensions, often producing quality solutions more rapidly than alternative methods.

The disadvantages of particle swarm optimization (PSO) algorithm are that it is easy to fall into local optimum in high-dimensional space and has a low convergence rate in the iterative process. There is no general convergence theory applicable to practical, multidimensional problems. For satisfactory results, tuning of input parameters and experimenting with various versions of the PSO method is sometimes necessary. Stochastic variability of the PSO results is very high for some problems and some values of the parameters. Also, some versions of the PSO method depend on the choice of the coordinate system. 

To address above mentioned problems, many solutions are present and can be divided into the following three types.

(i) Major modifications, including quantum-behaved PSO, bare-bones PSO, chaotic PSO, fuzzy PSO, PSOTVAC, OPSO, SPSO, and topology.

(ii) Minor modifications, including constriction coefficient, velocity clamping, trap detection, adaptive parameter, fitness-scaling, surrogate modeling, cooperative 

 mechanism, boundary shifting, position resetting, entropy map, ecological behavior, jumping-out strategy, preference strategy, neighborhood learning, and local search.

(iii) Hybridization, PSO being hybridized with GA, SA, TS, AIS, ACO, HS, ABC, DE, and so forth.

The modifications of PSO are: QPSO(quantum-behaved PSO), BBPSO(bare-bones PSO), CPSO(chaotic PSO), FPSO(fuzzy PSO), AFPSO(adaptive FPSO), IFPSO(improved FPSO), PSO with time-varying acceleration coefficients(PSOTVAC), OPSO(opposition-based PSO) and SPSO(standard PSO).

The application categories of PSO are “electrical and electronic engineering,” “automation control systems,” “communication theory,” “operations research,” “mechanical engineering,” “fuel and energy,” “medicine,” “chemistry,” “biology”.

Reference: Information collected from various sources in Internet


If you are always trying to be normal you will never know how amazing you can be!!