[R] I need arguments pro-S-PLUS and against SAS...

Frank E Harrell Jr f.harrell at vanderbilt.edu
Tue Jan 8 01:31:22 CET 2008


John Sorkin wrote:
> Frank,
> I believe you are proving my point. The difference is not so much the language as the end users. I use SAS, R, and SPlus on a regular basis. For some analyses, SAS is easiest to use, for some R (or SPlus). I can be just as dangerous using SAS and I can be with R if I don't think about what I am doing and not only check the assumptions of my models, but also pay attention to the results of the checks. You see problems with SAS data sets because you know what to look for and take the trouble to look for problems. When R (or SPlus) becomes commonly used by the great unwashed public, the number of poorly done analyses in these languages will increase. The basic problem with statistical software is that by making analyses easy to do, they allow anyone to do analyses. When an unprepared person sets about doing a complex task that should demand proper training and experience bad things happen quickly, and with high probability.
> 
> In any event, regardless of which side of the argument members of the R listserver might take, we are all deeply in your debt for the many contributions you have made not only to the R environment, but also to the R listserver. On behalf of the entire R community, thank you.
>

We'll have to have a friendly but strong disagreement about this.  I've 
watched statisticians work too many times to not believe that many will 
take the expedient route (e.g., assume linearity) when using 
non-flexible or non-powerful software (e.g., SAS).  And I don't find 
errors in the data usually because I know the data.  I find errors 
because I can say things like

  library(Hmisc)
  datadensity(mydata)   # show all raw data in small rug plots
  hist.data.frame(mydata)  # postage-stamp size histograms of all 
variables in dataset
  latex(describe(mydata)) # like PROC UNIVARIATE but shows MUCH more 
information in MUCH less space, including a high-resolution histogram 
next to the tabular info for each variable

I do agree with your comment about making things easy to do.

> With greatest respect and thanks,

Thanks very much for the kind words John.

Cheers

Frank

> John
> 
> John Sorkin M.D., Ph.D.
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
> 
>>>> Frank E Harrell Jr <f.harrell at vanderbilt.edu> 1/7/2008 6:41 PM >>>
> John Sorkin wrote:
>> I fear I risk being viewed as something of a curmudgeon, but the truth must be stated. S-Plus, R, SAS, etc. are all similar in that they are all tools to an end and not an end in themselves. Any one of the three can do most statistical analyses one might want to do. I could point out the strengths of  any one of the programming environments, but to be fair I would then be required to point out each platform's weaknesses. In the end, what matters is the quality and abilities of the person who uses the tools, not the tools themselves. I don't think you can make a fair statement that any one is absolutely better than the other. 
>> John 
> 
> John - I must respectfully disagree at least in part.  I have noticed 
> that SAS users are far more likely to assume linearity in doing 
> regression modeling, because SAS makes it so difficult to specify that 
> you want an unknown smooth function of a covariate in a model.  SAS 
> users are also less likely to bootstrap and to validate statistical 
> models because it's such a pain to do those in SAS.  Also when I get SAS 
> datasets from companies that have paid a fortune to a SAS-based contract 
> research organization, I can quickly spot major data errors using S 
> graphics; these errors were missed by all the SAS users because of poor 
> graphics.
> 
> Frank
> 
>> John Sorkin M.D., Ph.D.
>> Chief, Biostatistics and Informatics
>> University of Maryland School of Medicine Division of Gerontology
>> Baltimore VA Medical Center
>> 10 North Greene Street
>> GRECC (BT/18/GR)
>> Baltimore, MD 21201-1524
>> (Phone) 410-605-7119
>> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>>
>>>>> Jeffrey J. Hallman <jhallman at frb.gov> 1/7/2008 4:09 PM >>>
>> SAS programming is easy if everything you want to do fits easily into the
>> row-at-a-time DATA step paradigm.  If it doesn't, you have to write macros,
>> which are an abomination.  DATA step statements and macros are entirely
>> different programming languages, with one doing evaluations at "compile" time,
>> and the other at "run" time.  Except that that's not really true, either,
>> witness the 'call symput()' construct.  
>>
>> Then, if you want to interact at all with the user, you need to learn SCL, a
>> third language, with it's own rules.  And to do anything sophisticated with a
>> user interface (which will still look like hell), you have to learn the SAS
>> A/F toolkit built on SCL.  And of course, A/F requires you to think
>> differently yet again.
>>
>> So, to be a competent and versatile SAS programmer, you have to learn four
>> languages and four paradigms, and keep them all straight in your head while
>> programming.  Of course, hardly anyone can do this, so you usually find stacks
>> of reference documentation close at hand when you visit a SAS programmer's
>> office.
>>
>> R and Splus don't offer much in the way of GUI programming, but for problems
>> that don't require a lot of GUI, it's very nice.  You learn one language, it's
>> quite forgiving, it's interpreted and usually easy to debug, and the programs
>> you end up with are far more readable and maintainable than anything a SAS
>> programmer can turn out.  Reading my own SAS code is bad, and reading someone
>> else's is torture. 
>>
>> Do I sound like an R bigot?  Actually, I'm a Smalltalk bigot, which is even
>> nicer than R.  But R is quite usable for most things I do, and I use Smalltalk
>> for GUI-intensive stuff.  Speaking as a programmer, SAS is awful. 
>>
> 
> 


-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University




More information about the R-help mailing list