Saturday, 9 May 2015

More about R

Pros and Cons of R

Advantages

  • Open Source
  • Built in data analytic and statistical functions
  • Interfaces with data bases
  • Data handling and storage
  • High quality graphics capability

Disadvantages

  • Steep learning curve
  • Working with large datasets is limited by RAM
  • Language interpreter can be very slow
  • No professional or commercial support

High Performance Computing using R

The major limitations of R are:
    • R by default uses only a single core regardless of the number of cores of the CPU.
    • R reads data into the memory.

To resolve these problems different parallel computing packages have been introduced in R. Some of them are: Parallel, ff and big memory.
  • ParallelParallel package includes the functionality from snow and multicore packages. Even if it supports parallelism, the performance degrades while dealing with large datasets.
  • ffIt provides file based access to datasets that does not fit in memory. The main bottleneck is that it does not support character vectors.
  • Big memory: This package uses external pointers to refer to large objects stored in memory.

Alternatives to R

The following are the main alternatives to R:
Matlab: MATLAB is a high-level language and interactive environment for numerical computation, visualization, and programming. Matlab has stronger support for physical sciences while R is stronger for statistics.
Maxima: Maxima is a free computer algebraic system written in LISP based on 1982 version of Macsyma.
Gnu plot: Gnu plot is a plotting program and is much simpler than R
Scilab: Scilab is an open source language for numerical computation. The syntax is similar to Matlab.
Octave: Octave is a high level language used for numerical computation.
Mahout: It is a library of machine learning algorithms built on top of Apache Hadoop and map reduce.

R Vs Mahout

Apache Hadoop is used for the processing of big data. Mahout is a machine learning system that runs on Hadoop. The major drawback of R is in terms of its memory limitations. Generally R needs three times the dataset size in RAM to be able to work comfortably. Hence Mahout is the best alternative to R when dealing with large datasets.

Model
Implementation in R
Implementation in Mahout
Decision Tree
Yes
No
Random Forest
Yes
Yes
Stepwise logistic Regression
Yes
No
Neural Networks
Yes
No
Continuous network(Y)
Yes
No
Table: Comparison of R and Mahout

Note


R is a free statistical and graphical programming language. It contains many advanced statistical routines. It runs on a variety of platforms including UNIX, Windows and MacOS. Lack of futuristic insights and complex predictive analytics algorithms are some of the major pitfalls of presently available data analytic tools. R as a statistical and predictive language resolves all these issues. It is recommended in scenarios where the different steps of analysis should be documented for future updates.


It seems bored of reading about R. Will update you on another interesting technology in the next post!!!!!!!!
Happy Learning 

2 comments:

  1. Hi Anju,
    Thanks for the information. Can you also share your thoughts on vectorization in R?
    Vectorization claims to increase R's performance many times.

    Thanks,
    Viney

    ReplyDelete
    Replies
    1. Hi Viney,
      'll update you soon on the mentioned topic.

      Regards
      Anju Prasannan

      Delete