Sunday, 13 December 2015

Weka

With the advent of Search Engines and Social Media Sites, there is an explosion of data. Today’s age can be regarded as “We are drowning in data, but starving for knowledge”. Companies are spending millions to build data warehouses for storing the data. But most of the companies fail in getting the expected ROI from this data. Here comes the importance of data mining. Data mining is the process of gaining knowledge by analyzing the patterns and trends in the data. Different data mining tools such as R, Rapid Miner and Weka are used for this purpose. Weka stands for Waikato Environment for Knowledge Analysis. It is a statistical and data analysis tool written in Java. Weka was developed by a team of researchers at Waikato University in New Zealand. Weka is a collection of visualization tools and algorithms for data analysis. It supports most of the standard data mining tasks such as data preprocessing, clustering, classification, regression, visualization and feature selection. Weka is an open source data mining software and is available under GNU General Public License agreement. It was originally written in C and later was rewritten to Java. Hence it is compatible with all computing platforms. It also provides a GUI for ease of use. Weka works on the assumption that the data is available as flat file where the different attributes in the data set is fixed. The most stable version of Weka is 3.6.13 which was released on September 11, 2015. 

Features of Weka

The following are the important features of Weka:
2.1.1 Open Source software
          Weka is freely available under the GNU GPL. The source code of Weka is written in Java.
2.1.2 Designed for data analysis
          It consists of a vast collection of algorithms for data mining and machine learning. Weka is kept up-to-date with new algorithms being added.
2.1.3 Ease of use
             It is easily useable by people who are not data mining specialists 
2.1.4 Platform independence
           Weka is platform independent.

Functionalities provided by Weka

The following are the basic functionalities provided by Weka:
  • Data Preprocessing: Weka supports various data formats including the database connectivity using JDBC.
  • Classification:Weka includes more than 100 classification algorithms. Classifiers are divided into Bayesian methods(Naïve Bayes, Bayesian nets etc), lazy methods(nearest neighbor and variants), rule based methods(decision tables, OneR, RIPPER etc), tree learners(C4.5, Naïve Bayes, M5), function based learners(Linear Regression, SVM, Gaussian Process) and other miscellaneous methods.
  • Clustering: The various clustering algorithms implemented in Weka includes K-means, EM and other hierarchical methods.
  • Attribute selection: The classifier performance depends on the attributes selected. Various search methods and selection criteria are available for attribute selection.
  • Data visualization: Various visualization options include Tree viewer, Dendrogram viewer and Bayes Network Viewer.


More updates about Weka in the next post.

Nothing is softer than water, But its force can break the hardest rock.

No comments:

Post a Comment