Pros and Cons of R
Advantages
- Open Source
- Built in data analytic and statistical functions
- Interfaces with data bases
- Data handling and storage
- High quality graphics capability
Disadvantages
- Steep learning curve
- Working with large datasets is limited by RAM
- Language interpreter can be very slow
- No professional or commercial support
High Performance Computing using R
The major limitations of R are:
- R by default uses only a single core regardless of the number of cores of the CPU.
- R reads data into the memory.
To resolve these problems
different parallel computing packages have been introduced in R. Some of them
are: Parallel, ff and big memory.
- Parallel: Parallel package includes the functionality from snow and multicore packages. Even if it supports parallelism, the performance degrades while dealing with large datasets.
- ff: It provides file based access to datasets that does not fit in memory. The main bottleneck is that it does not support character vectors.
- Big memory: This package uses external pointers to refer to large objects stored in memory.
Alternatives to R
The following are the main
alternatives to R:
Matlab: MATLAB is a high-level
language and interactive environment for numerical computation, visualization,
and programming. Matlab has stronger support for physical sciences while R is
stronger for statistics.
Maxima: Maxima is a free computer
algebraic system written in LISP based on 1982 version of Macsyma.
Gnu plot: Gnu plot is a plotting
program and is much simpler than R
Scilab: Scilab is an open source
language for numerical computation. The syntax is similar to Matlab.
Octave: Octave is a high level
language used for numerical computation.
Mahout: It is a library of
machine learning algorithms built on top of Apache Hadoop and map reduce.
R Vs Mahout
Apache Hadoop is used for the
processing of big data. Mahout is a machine learning system that runs on
Hadoop. The major drawback of R is in terms of its memory limitations.
Generally R needs three times the dataset size in RAM to be able to work
comfortably. Hence Mahout is the best alternative to R when dealing with large
datasets.
Model
|
Implementation in R
|
Implementation in Mahout
|
Decision Tree
|
Yes
|
No
|
Random Forest
|
Yes
|
Yes
|
Stepwise logistic Regression
|
Yes
|
No
|
Neural Networks
|
Yes
|
No
|
Continuous network(Y)
|
Yes
|
No
|
Table: Comparison of R and
Mahout
Note
R is a free statistical and
graphical programming language. It contains many advanced statistical routines.
It runs on a variety of platforms including UNIX, Windows and MacOS. Lack of
futuristic insights and complex predictive analytics algorithms are some of the
major pitfalls of presently available data analytic tools. R as a statistical
and predictive language resolves all these issues. It is recommended in
scenarios where the different steps of analysis should be documented for future
updates.
It seems bored of reading about R. Will update you on another interesting technology in the next post!!!!!!!!
Happy Learning
Hi Anju,
ReplyDeleteThanks for the information. Can you also share your thoughts on vectorization in R?
Vectorization claims to increase R's performance many times.
Thanks,
Viney
Hi Viney,
Delete'll update you soon on the mentioned topic.
Regards
Anju Prasannan