[R] Reasons to Use R

Stephen Tucker brown_emu at yahoo.com
Fri Apr 6 11:19:44 CEST 2007


Hi Lorenzo,

I don't think I'm qualified to provide solid information on the first
three questions, but I'd like to drop a few thoughts on (4). While
there are no shortage of language advocates out there, I'd like to
join in for this once. My background is in chemical engineering and
atmospheric science; I've done simulation on a smaller scale but spend
much of my time analyzing large sets of experimental data. I am
comfortable programming in Matlab, R, Python, C, Fortran, Igor Pro,
and I also know a little IDL but have not programmed in it
extensively.

As you are probably aware, I would count among these, Matlab, R,
Python, and IDL as good candidates for processing large data sets, as
they are high-level languages and can communicate with netCDF files
(which I imagine will be used to transfer data).

Each language boasts an impressive array of libraries, but what I
think gives R the advantage for analyzing data is the level of
abstraction in the language. I am extremely impressed with the objects
available to represent data sets, and the functions support them very
well - it requires that I carry around a fewer number of objects to
hold information about my data (and I don't have to "unpack" them to
feed them into functions). The language is also very "expressive" in
that it lets you write a procedure in many different ways, some
shorter, some more readable, depending on what your situation
requires. System commands and text processing are integrated into the
language, and the input/output facilities are excellent, in terms of
data and graphics. Once I have my data object I am only a few
keystrokes to split, sort, and visualize multivariate data; even after
several years I keep discovering new functions for basic things like
manipulation of data objects and descriptive statistics, and plotting
- truly, an analyst's needs have been well anticipated.

And this is a recent obsession of mine, which I was introduced to
through Python, but the functional programming support for R is
amazing. By using higher-order functions like lapply(), I infrequently
rely on FOR-LOOPS, which have often caused me trouble in the past
because I had forgotten to re-initialize a variable, or incremented
the wrong variable, etc. Though I'm definitely not militant about
functional programming, in general I try to write functions and then
apply them to the data (if the functions don't exist in R already),
often through higher-order functions such as lapply(). This approach
keeps most variables out of the global namespace and so I am less
likely to reassign a value to a variable that I had intended to
keep. It also makes my code more modular so that I can re-use bits of
my code as my analysis inevitably grows much larger than I had
originally intended.

Furthermore, my code in R ends up being much, much shorter than code I
imagine writing in other languages to accomplish the same task; I
believe this leads to fewer places for errors to occur, and the nature
of the code is immediately comprehensible (though a series of nested
functions can get pretty hard to read at times), not to mention it
takes less effort to write. This also makes it easier to interact with
the data, I think, because after making a plot I can set up for the
next plot with only a few function calls instead of setting out to
write a block of code with loops, etc.

I have actually recommended R to colleagues who needed to analyze the
information from large-scale air quality/ global climate simulations,
and they are extremely pleased. I think the capability for statistics
and graphics is well-established enough that I don't need to do a
hard-sell on that so much, but R's language is something I get very
excited about. I do appreciate all the contributors who have made this
available.

Best regards,
ST


--- Lorenzo Isella <lorenzo.isella at gmail.com> wrote:

> Dear All,
> The institute I work for is organizing an internal workshop for High
> Performance Computing (HPC).
> I am planning to attend it and talk a bit about fluid dynamics, but
> there is also quite a lot of interest devoted to data post-processing
> and management of huge data sets.
> A lot of people are interested in image processing/pattern recognition
> and statistic applied to geography/ecology, but I would like not to
> post this on too many lists.
> The final aim of the workshop is  understanding hardware requirements
> and drafting a list of the equipment we would like to buy. I think
> this could be the venue to talk about R as well.
> Therefore, even if it is not exactly a typical mailing list question,
> I would like to have suggestions about where to collect info about:
> (1)Institutions (not only academia) using R
> (2)Hardware requirements, possibly benchmarks
> (3)R & clusters, R & multiple CPU machines, R performance on different
> hardware.
> (4)finally, a list of the advantages for using R over commercial
> statistical packages. The money-saving in itself is not a reason good
> enough and some people are scared by the lack of professional support,
> though this mailing list is simply wonderful.
> 
> Kind Regards
> 
> Lorenzo Isella
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



 
____________________________________________________________________________________
Bored stiff? Loosen up...



More information about the R-help mailing list