[R] Statistical Software Comparison

Wed Nov 22 18:00:58 CET 2006

On Tue, 21 Nov 2006, Kenneth Cabrera wrote:

> Hi R users:
>
> I want to know if any of you had used
> Stata or Statgraphics.

We use Stata for teaching courses aimed at graduate students in other 
departments, and also (as a consequence) on a lot of medical/public health 
research projects.  It is easier to learn than R, and has good support for 
all the methods we teach in the service courses  [unlike, eg, SPSS or 
Minitab].  Part of the reason it is easier to learn is that there is a 
very regular syntax. [There is also a GUI, now, but it isn't a very good 
one and we were using Stata for teaching before it had a GUI].

> What are the advantages and disadvantages with
> respect to R on the following aspects?
>
> 1. Statistical functions or options for advanced
>    experimental design (fractional, mixed models,
>    greco-latin squares, split-plot, etc).

Stata is not very good at this sort of thing. Neither is R, yet, since 
lme() is really for longitudinal data and lmer() is still developing.

> 2. Bayesian approach to experimental design.

Not much here, either, in Stata

> 3. Experimental design planing options.
Or here.

> 4. Manuals (theory included in the manuals).

Stata is excellent. They usually give formulas as well as references (and 
sometimes algorithms and computational notes that are not in the 
references).  The only problem is they keep growing and dividing, so the 
cost of a complete set goes up quite rapidly with each release (and the 
volume that you want is always on the other side of the room or lent out 
to someone).

The online help is also good. It suffers relative to R from the examples 
not necessarily being directly executable.

> 5. Support (in this aspect there is no comparison with R,
>    the R list is the best known support).

The Stata list is pretty good, too.  You can see it at
http://www.hsph.harvard.edu/statalist/

> 6. Numerical stability.

For most purposes this is not really an issue and I haven't pushed Stata 
to the edge. I haven't seen any problems.

Stata does have a smaller range of built-in optimizers, and they seem to 
have stopped at the Marquadt algorithm.  This has only once been a problem 
for me (in fitting log-binomial generalized linear models), but could be a 
problem in implementing new methods.

> 7. Implementation of modern statistical approaches.
>

It depends on the area. It's not bad at all in biostatistics and in some 
areas of econometrics. As with R there is also a lot of user-written code, 
some of it of excellent quality.

The Stata language is better than it looks, but some things can be easily 
programmed in it and some can't. The last two versions of Stata have 
introduced language changes in order to be able to implement better 
graphics and linear mixed models, and you can also now call C code from 
Stata, so things are improving.

Algorithms that are suited to a `one rectangular dataset' view of the 
world are often very fast in Stata, but the penalty for not vectorizing is 
even stiffer than in R.

 	-thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle