[R] how to perform multiple comparison？

Fri May 20 13:53:13 CEST 2016

thanks for your help.

发自 网易邮箱大师
On 2016-05-20 09:48 , David Winsemius Wrote:

> On May 19, 2016, at 5:19 PM, Jim Lemon <drjimlemon at gmail.com> wrote:
>
> Hi laomeng_3,
> Have a look at the padjust function (stats).
>
> Jim
>
>
> On Fri, May 20, 2016 at 1:56 AM, laomeng_3 <laomeng_3 at 163.com> wrote:
>> Hi all:
>> As to the anova, we can perform multiple comparison via TukeyHSD.
>> But as to chi-square test for frequency table,how to perform multiple comparison?
>>
>> For example, if I want to compare 3 samples' ratio(the data has 3 rows,each row corresponds to 1 sample,and has 2 columns,each column corresponds to positive and negative respectively).
>>
>>
>> dat<-matrix(c(6,30,8,23,14,3),nrow=3)
>> dat
>>      [,1] [,2]
>> [1,]    6   23
>> [2,]   30   14
>> [3,]    8    3
>>
>>
>>
>> chisq.test(dat)
>>
>>       Pearson's Chi-squared test
>>
>> data:  dat
>> X-squared = 17.9066, df = 2, p-value = 0.0001293
>>
>>
>> The result shows that the difference between the 3 samples is significant.But if I want to perform multiple comparison to find out which pair of samples is  significantly different,which function should be used?
>>

It appears your question is which row(s) are contributing most greatly to the overall test of independence. The result of a `chisq.test(.)` (which is not what you see from its print method) has a component named residuals. (Read the help page : ?chisq.test)

x2 <- chisq.test(dat)
x2$residuals
          [,1]       [,2]
[1,] -2.3580463  2.4731398
[2,]  1.4481733 -1.5188569
[3,]  0.9323855 -0.9778942

Those row sums should be distributed as chi-squared statistics with one degree of freedom each, but since you have admittedly inflated the possibility of the type I error, it would be sensible to adjust the "p-statistics" using the function that Jim Lemon misspelled:

> rowSums(x2$residuals^2)
[1] 11.676803  4.404132  1.825620

> p.adjust( 1- pchisq( rowSums(x2$residuals^2), 1) )

[1] 0.001898526 0.071703921 0.176645786

So row 1 represents the only group that is "significantly different at the conventional level" from the expectations based on the overall sample collection. I also seem to remember that there is a function named CrossTable (in a package whose name I'm forgetting) that will deliver a SAS-style tabulation of row and column chi-squared statistics.

--
David.

>>
>> Many thanks for your help.
>>
>> My best
>>
>>
>>
>> 发自 网易邮箱大师
>>        [[alternative HTML version deleted]]
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

	[[alternative HTML version deleted]]