[R] statistical modelling SAS vs R

Frank E Harrell Jr f.harrell at vanderbilt.edu
Fri Feb 3 17:36:29 CET 2006


Bill Szkotnicki wrote:
> Hello,
> 
> Recently I have been reading a lot of material about statistical modeling
> using R. There seems to be conflicting opinions about what the best approach
> is between the SAS community and the R community.
> 1) In R one might start with a model that has all possible effects of
> interest in it and then simplify by eliminating/adding insignificant effects
> using a stepwise procedure.
> 2) In SAS one may starts with a "reasonable" model and look at type 3 SS's
> to test hypotheses and report LSMEANS. This can be done in R too I think.
> 
> Does anyone have current opinions about this? I know it's been discussed
> before but I would be very interested in hearing about the advantages and
> pitfalls of both approaches.
> 
> Bill

You'll get lots of opinions about this.  Both R and SAS can be abused 
terribly, and both approaches you mentioned have major problems if you 
use P-values to specify models.  Better and more replicable results can 
be obtained using modern shrinkage methods and being more liberal with 
inclusion of variables, or by using Bayesian model averaging.

Note that LSMEANS and Type III tests are SAS concoctions and that if you 
have interactions in the model, type III tests have been criticized.  If 
there are no interactions, type Type III = Type II.

Frank Harrell

-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University




More information about the R-help mailing list