[R] question for aov and kruskal

Rolf Turner r.turner at auckland.ac.nz
Wed Mar 12 22:33:10 CET 2008


I thought your question was well expressed and that you followed the
posting guide better than most.

I'm no expert on such issues, but I'd like to kick in a few opinions
(with which others may disagree).

(1) All of the anova stuff is based on the assumption of homogeneity
     of variance.  However my understanding is that the model is  
quite robust
     to this assumption.  Problems may arise if there are small sample
     sizes in some cells and if the small samples are associated with
     large variances.  Otherwise there is not all that much of a worry.

(2) The Tukey test is indeed based on the assumption of equal sample
     sizes.  The version of the test for unbalanced data is an  
approximation.
     My understanding is that it's a pretty good approximation.

(3) For multiple comparisons after applying the Kruskal-Wallis test:   
Experts
     on non-parametric statistics may know about more powerful  
methods, but
     I would be inclined simply to apply a Bonferroni correction to a  
collection
     of pairwise tests (e.g. wilcox.test).  Just multiply the p- 
values by
     the number of pairwise comparisons, k-choose-2 where k is the  
number of
     groups (= 3-choose-2 = 3 in your toy example).

(4) Generally speaking I would say that if a classical test and a non- 
parametric
     test give different answers, then your data are being coy about  
revealing
     their true import.  I would have very little faith in either  
answer, and
     would claim that you really need more data.

     Unfortunately this need can rarely be satisfied.  If you have to  
make a
     decision one way or another, then you should go with the non- 
parametric
     answer.

(5) Finally, my universal prescription is:  ``When in doubt, simulate.''
     I.e. simulate multiple data sets on the basis of models fitted to,
     or related to, your real data.  Run the possible tests on the  
simulated
     data sets.  Since these data are simulated, you know what the right
     answer is.  Count up how often you get the right answer.

     Such an exercise can be quite revealing.

HTH

		cheers,

			Rolf Turner

On 13/03/2008, at 9:19 AM, eugen pircalabelu wrote:

> Hi,
>
> My data was only a toy example that matched the real situation,  
> with real data, but i could not have posted the entire data.set and  
> so i gave a self contained example of what i thought was my  
> problem. Of course you can see with the naked eye that the data is  
> unbalanced, (this was done intentionally) but like i said this was  
> only a toy example, mimicking a problem from a real data set.
>
> Thank you and have a great ahead!
>
>
> David Hewitt <dhewitt37 at gmail.com> wrote:
>
>
>> I have the following problem: how appropriate is my aov model  
>> under the
>> violation of anova assumptions?
>>
>> Example:
>> a<-c(1,1,1,1,1,1,1,1,1,1,2,2,2,3,3,3,3,3,3,3)
>> b<-c(101,1010,200,300,400, 202, 121, 234, 55,555,66,76,88,34,239,  
>> 30, 40,
>> 50,50,60)
>> z<-data.frame(a, b)
>> fligner.test(z$b, factor(z$a))
>> aov(z$b~factor(z$a))->ll
>> TukeyHSD(ll)
>>
>> Now from the aov i found that my  model is unbalanced, and from the
>> flinger test  i found out that the assumption of homogeneity  of  
>> variances
>> is rejected. Could my Tukey comparison be a valid one under these
>> violations? From what i read the Tukey test is valid only when the  
>> model
>> is balanced and when the assumption of homogeneity of variances is  
>> not
>> rejected, am i wrong? Can anyone tell me what would be the correct  
>> test in
>> this case?
>>
>> Doing a non-parametric Kruskal - wallis test would give me a  
>> different
>> result. But what would be the correct multiple comparison test in  
>> this
>> case?
>>
>
> You shouldn't have needed aov to tell you that the data (not the  
> model) are
> unbalanced. I could see that without running the code! Seriously,  
> you might
> need to think more about the type of model you're using, and what  
> you want
> to know, and then consider how to estimate the effect sizes of  
> interest.
>
>
> -----
> David Hewitt
> Virginia Institute of Marine Science
> http://www.vims.edu/fish/students/dhewitt/
> -- 
> View this message in context: http://www.nabble.com/question-for- 
> aov-and-kruskal-tp15955385p15976643.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> ---------------------------------
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.


######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}



More information about the R-help mailing list