[R] Problems with normality req. for ANOVA

Liaw, Andy andy_liaw at merck.com
Tue Aug 3 14:41:50 CEST 2010


As a matter of fact, I would say both Bert and I encounter "designed
experiments" a lot more than "observational studies", yet we speak from
experience that those things that Bert mentioned happen on a daily
basis.  When you talk to experimenters, ask your questions carefully and
you'll see these things crop up.

Andy
 

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of David Winsemius
Sent: Monday, August 02, 2010 3:35 PM
To: Bert Gunter
Cc: r-help at r-project.org; wwreith
Subject: Re: [R] Problems with normality req. for ANOVA

In a general situation of observational studies, your point is  
undoubtedly true, and apparently you believe it to be true even in the  
setting of designed experiments. Perhaps I should have confined myself  
to my first sentence.

-- 
David.


On Aug 2, 2010, at 2:05 PM, Bert Gunter wrote:

> David et. al:
>
> I take issue with this. It is the lack of independence that is the  
> major issue. In particular, clustering, split-plotting, and so forth  
> due to "convenience order" experimentation, lack of randomization,  
> exogenous effects like the systematic effects due to measurement  
> method/location have the major effect on inducing bias and  
> distorting inference. Normality and unequal variances typically pale  
> to insignificance compared to this.
>
> Obviously, IMHO.
>
> Note 1: George Box noted this at least 50 years ago in the early  
> '60's when he and Jenkins developed arima modeling.
>
> Note 2: If you can, have a look at Jack Youden's classic paper  
> "Enduring Values", which comments to some extent on these issues,  
> here: http://www.jstor.org/pss/1266913
>
> Cheers,
> Bert
>
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
>
>
> On Mon, Aug 2, 2010 at 10:32 AM, David Winsemius
<dwinsemius at comcast.net 
> > wrote:
>
> On Aug 2, 2010, at 9:33 AM, wwreith wrote:
>
>
> I am conducting an experiment with four independent variables each  
> of which
> has three or more factor levels. The sample size is quite large i.e.  
> several
> thousand. The dependent variable data does not pass a normality test  
> but
> "visually" looks close to normal so is there a way to compute the  
> affect
> this would have on the p-value for ANOVA or is there a way to  
> perform an
> nonparametric test in R that will handle this many independent  
> variables.
> Simply saying ANOVA is robust to small departures from normality is  
> not
> going to be good enough for my client.
>
> The statistical assumption of normality for linear models do not  
> apply to the distribution of the dependent variable, but rather to  
> the residuals after a model is estimated. Furthermore, it is the  
> homoskedasticity assumption that is more commonly violated and also  
> greater threat to validity. (And if you don't already know both of  
> these points, then you desperately need to review your basic  
> modeling practices.)
>
>
>  I need to compute an error amount for
> ANOVA or find a nonparametric equivalent.
>
> You might get a better answer if you expressed the first part of  
> that question in unambiguous terminology.  What is "error amount"?
>
> For the second part, there is an entire Task View on Robust  
> Statistical Methods.
>
> -- 
>
> David Winsemius, MD
> West Hartford, CT
>
>
>
>

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Notice:  This e-mail message, together with any attachme...{{dropped:11}}



More information about the R-help mailing list