[R] anti-R vitriol

Wed Jun 30 13:46:27 CEST 2004

I am curious.  What were the dimensions of this data set?  Did this 
person know use read.table(), or scan().  Did they know about the 
possibility of reading the data one part at a time?

The way that SAS processes the data row by row limits what can be done. 
  It is often possible with scant loss of information, and more 
satisfactory, to work with a subset of the large data set or with 
multiple subsets.  Neither SAS (in my somewhat dated experience of it) 
nor R is entirely satisfactory for this purpose.  But at least in R, 
given a subset that fits so easily into memory that the graphs are not 
masses of black, there are few logistic problems in doing, rapidly and 
interactively, a variety of manipulations and plots, with each new task 
taking advantage of the learning that has gone before.  To do that well 
in the SAS world, it is necessary to use something like JMP or its 
equivalent in one of the newer modules, which process data in a way 
that is not all that different from R.

I have wondered about possibilities for a suite of functions that would 
make it easy to process through R data that is stored in one large data 
set, with a mix of adding a new variable or variables, repeating a 
calculation on successive subsets of the data, producing predictions or 
suchlike for separate subsets, etc. Database connections may be the way 
to go (c.f., the Ripley and Fei Chen paper at ISI 2003), but it might 
also be useful to have a simple set of functions that would handle some 
standard requirements.

John Maindonald.

On 30 Jun 2004, at 8:02 PM, Barry Rowlingson 
<B.Rowlingson at lancaster.ac.uk> wrote:

> A colleague is receiving some data from another person. That person 
> reads the data in SAS and it takes 30s and uses 64k RAM. That person 
> then tries to read the data in R and it takes 10 minutes and uses a 
> gigabyte of RAM. Person then goes on to say:
>
>   It's not that I think SAS is such great software,
>   it's not.  But I really hate badly designed
>   software.  R is designed by committee.  Worse,
>   it's designed by a committee of statisticians.
>   They tend to confuse numerical analysis with
>   computer science and don't have any idea about
>   software development at all.  The result is R.
>
>   I do hope [your colleague] won't have to waste time doing
>   [this analysis] in an outdated and poorly designed piece
>   of software like R.
>
> Would any of the "committee" like to respond to this? Or shall we just 
> slap our collective forehead and wonder how someone could get such a 
> view?
>
John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Bioinformation Science, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.