[R] model simplification using Crawley as a guide

hpdutra hpdutra at yahoo.com
Tue Jan 5 23:37:06 CET 2010


So here is some information that I hope gets criticized by the
higher-intelligences that posted on this topic. Beware that I'm not a
statistician and I'm just saying about what I think is correct.

First, before fitting any model, check the distribution of your data, in
some cases a simple anova is way better than a very complicated model, after
you tried some data transformations or simply realizing that you can't fit
the model because it has a unique distribution then you might consider more
complicates options such as general linear models or mixed models, in that
case Hadley's information comes handy, check your residuals, plot your
models and see what kind of insight you get from it, this way you can move
to more complicated models. After you find a model that fits your data then
you can start thinking about simplifying it. Crawley's approach of
simplifying your model by dropping the non-significant interactions has been
slammed here but quite honestly it is still used (I'm not saying it is
correct though) and I don't see how drop1 from Peter Dalgaard is much better
(again I'm not saying that it isn't but I just lack the knowledge to explain
the benefits). 
Ben Bolker et al wrote a very good paper on how to make model simplication
(check Trends in Ecology Evolution 24:3). At least in ecology AIC seems to
be the most used methodology for model simplication. Meaning that people
simplify their overparameterized models (with all co-variables they could
get)  just looking at AIC and then report the p-values. How is that any
different from a stepwise approach I don't know and probably that is the
reason why Crawley's approach is heavily criticized.  

Often times people mix the AIC approach with the traditional frequentist
approach (p values). If I get it right from the workshop that I took with
David Anderson this approach is considered to be wrong and if you decide to
simplify your models based on AIC then you should use model averaging
instead of just reporting what was significant. 

My impression is that you are usually better off using a simple analysis
that you understand what is going on have enough scientific background to
support your statistical inference than using a super elaborated model with
lots of variables and fancy stats that you don't master all the shortcomings
of analysis. Sometimes less is more, but hey studying harder (especially
stats) can pay off (better publications). 

PS: I wish there was a book on Ecological Experiments (I am sure there is)
explaining more modern approaches to analyze factorial experiments,
Crawley's book is a good start but it is too simplistic sometimes, I haven't
seen Bolker's book yet so I'm not sure it cover's experiments, it certainly
would be awesome if he made his course on Modeling Ecological data available
on the youtube. 




Jim Lemon wrote:
> 
> Peter Dalgaard wrote:
>> ...
>> That'll be anti-hist()-amine, I presume?
>> 
> I would think p-necillin a more appropriate treatment.
> 
> Jim
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: http://n4.nabble.com/model-simplification-using-Crawley-as-a-guide-tp858580p999459.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list