[R] A comment about R:

Thomas Lumley tlumley at u.washington.edu
Tue Jan 3 20:23:18 CET 2006


On Tue, 3 Jan 2006, Peter Dalgaard wrote:
> One thing that is often overlooked, and hasn't yet been mentioned in
> the thread, is how much *simpler* R can be for certain completely
> basic tasks of practical or pedagogical relevance: Calculate a simple
> derived statistic, confidence intervals from estimate and SE,
> percentage points of the binomial distribution - using dbinom or from
> the formula, take the sum of each of 10 random samples from a set of
> numbers, etc. This is where other packages get stuck in the
> procedure+dataset mindset.

Some of these things are actually fairly straightforward in Stata. For 
example, Stata will give confidence intervals and tests for linear 
combinations of coefficients and even (using symbolic differentiation and 
the delta method) for nonlinear combinations.  The first is available in 
packages in R, the second is in "S Programming" but doesn't seem to be 
packaged.

. di Binomial(10,4,0.2)
.12087388

Taking the sum of each of ten random samples, or other things of that 
sort, obviously requires creating a new data set, but again there are 
facilities to automate this.  I have, for example, computed bootstrap 
confidence intervals for ratio or difference of medians in a service 
course using Stata.  It would be easier in R, but not that much easier.


> For much the same reason, those packages make you tend to treat
> practical data analysis as something distinct from theoretical
> understanding of the methods: You just don't use SAS or SPSS or Stata
> to illustrate the concept of a random sample by setting up a small
> simulation study as the first thing you do in a statistics class,
> whereas you could quite conceivably do it in R. (What *is* the
> equivalent of rnorm(25) in those languages, actually?)

set obs 25
gen x = invnorm(uniform())

[This does create a new data set, of course]

> Even when using SAS in teaching, I sometimes fire up R just to
> calculate simple things like
>
>  pbar <- (p1+p2)/2
>  sqrt(pbar*(1-pbar))

local pbar=(0.3+0.5)/2
display sqrt(`pbar'*(1-`pbar'))

Now, I still prefer R both for data analysis and (even more so) for 
programming. There are some things that it is genuinely difficult to 
program in Stata -- and as evidence that this isn't just my ignorance of 
the best approaches, the language was substantially reworked in both 
versions 8 and 9 to allow the vendor to implement better graphics and
linear mixed models.

On the question of which system really is easier to learn I can only 
comment that this isn't the only question where education, as a field, 
would benefit from some good randomized controlled trials.

 	-thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle




More information about the R-help mailing list