[R] Statistical computing

Frank E Harrell Jr fharrell at virginia.edu
Mon Mar 31 17:39:04 CEST 2003


On Mon, 31 Mar 2003 09:04:25 -0500
Tanya Murphy <tmurph6 at po-box.mcgill.ca> wrote:

> Thanks to all who have replied to this. I find the advice very encouraging. 
> I've been reading the recommended links on Sweave and I think it will answer a 
> major part of my goals.
> 
> As for Perl vs. Python, I don't know which would be best. I've started out in 
> Perl because someone got me started with a little Perl program, but I've 
> looked at Python, too. I'm working in Windows (and that's not likely to change 
> anytime soon--at the office, anyway) and I think WinEdt serves as a good 
> enhanced editor for the main applications--LaTex, R and Perl--as well as a way 
> to organize the files for a project. The GUI for Pyton seems nice, too, 
> though.
> 
> Saghir, why do you prefer Python?
> 
> Is there a fairly easy way to become SAS-free for data management and 
> cleaning? I'm told R is really not ideal for data cleaning. Is this what RODBC 
> is about?
> 
> Tanya

The S language is actually better than SAS for data manipulation unless you have a massive database.  The trouble is that you don't learn data manipulation by looking at documention of individual functions.  Chapter 4 of Alzola and Harrell has attempted to provide several data manipulation/variable recoding examples.

The main reason I'm confident in saying that S is better in what many people say SAS is best at is that many manipulation and recoding tasks benefit greatly from vector operations across multiple codes within a variable.  Contrast this with multiple IF statements required in many SAS applications.  

One feature of SAS that is frequently used for data manipulation is BY with FIRST.variable and LAST.variable.  As seen in the examples I mentioned above, you handle this in a completely different way in S (using lags, aggregation functions, or for loops).
-- 
Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat



More information about the R-help mailing list