[R] [OT] 1 vs 2-way anova technical question

Giovanni Azua bravegag at gmail.com
Mon Nov 21 12:04:43 CET 2011


I know there is plenty of people in this group who can give me a good answer :)

I have a 2^k model where k=4 like this:
Model 1) R~A*B*C*D

If I use the "*" in R among all elements it means to me to explore all interactions and include them in the model i.e. I think this would be the so called 2-way anova. However, if I do this, it leads to model violations i.e. the homoscedasticity is violated, the normality assumption of the sample errors i.e. residuals is violated etc. I tried correcting the issues using different standard transformations: log, sqrt, Box-Cox forms etc but none really improve the result. In this case even though the model assumptions do not hold, some of the interactions are found to significatively influence the response variable. But then shall I trust the results of this Model 1) given that the assumptions do not hold?

Then I try this other model where I exclude the interactions (is this the 1-way anova?):
Model 2) R~A+B+C+D

In this one the model assumptions hold except the existence of some outliers and a slightly heavy tail in the QQ-plot.

Given that the assumptions for Model 1) do not hold, I assume I should ignore the results altogether for Model 1) or? or instead can I safely use the Sum Sq. of Model 1) to get my table of percent of variations?

This to me was a bit counter-intuitive since I assumed that if there was collinearity among factors (and there is e.g. I(A*B*C)) the Model 1) and I included those interactions, my model would be more accurate ... ok this turned into a brand new topic of model selection but I am mostly interested in the question: if model is violated can I or must I not use the results e.g. Sum Sqr for that model?

Can anyone advice please?

btw I have bought most books on R and statistical analysis. I have researched them all and the ANOVA coverage is very shallow in most of them specially in the R-sy ones, they just offer a slightly pimped up version of the R-help. 

I am also unofficially following a course on ANOVA from the university I am registered in and most examples are too simplistic and either the assumptions just hold easily or the assumptions don't hold and nothing happens.  

Thanks in advance,
Best regards,

More information about the R-help mailing list