[R] how to perform multiple comparison？

laomeng_3 laomeng_3 at 163.com
Fri May 20 13:53:13 CEST 2016

On 2016-05-20 09:48 , David Winsemius Wrote:

On May 19, 2016, at 5:19 PM, Jim Lemon <drjimlemon at gmail.com> wrote:
> Hi laomeng_3,
> Have a look at the padjust function (stats).
> Jim
> On Fri, May 20, 2016 at 1:56 AM, laomeng_3 <laomeng_3 at 163.com> wrote:
>> Hi all:
>> As to the anova, we can perform multiple comparison via TukeyHSD.
>> But as to chi-square test for frequency table,how to perform multiple comparison?
>> For example, if I want to compare 3 samples' ratio(the data has 3 rows,each row corresponds to 1 sample,and has 2 columns,each column corresponds to positive and negative respectively).
>>
>> dat<-matrix(c(6,30,8,23,14,3),nrow=3)
>> dat
>>      [,1] [,2]
>> [1,]    6   23
>> [2,]   30   14
>> [3,]    8    3
>> chisq.test(dat)
>>
>>       Pearson's Chi-squared test
>> data:  dat
>> X-squared = 17.9066, df = 2, p-value = 0.0001293
>> The result shows that the difference between the 3 samples is significant.But if I want to perform multiple comparison to find out which pair of samples is  significantly different,which function should be used?
It appears your question is which row(s) are contributing most greatly to the overall test of independence. The result of a `chisq.test(.)` (which is not what you see from its print method) has a component named residuals. (Read the help page : ?chisq.test)

x2 <- chisq.test(dat)
x2\$residuals
[,1]       [,2]
[1,] -2.3580463  2.4731398
[2,]  1.4481733 -1.5188569
[3,]  0.9323855 -0.9778942

Those row sums should be distributed as chi-squared statistics with one degree of freedom each, but since you have admittedly inflated the possibility of the type I error, it would be sensible to adjust the "p-statistics" using the function that Jim Lemon misspelled:

> rowSums(x2\$residuals^2)
[1] 11.676803  4.404132  1.825620

> p.adjust( 1- pchisq( rowSums(x2\$residuals^2), 1) )

[1] 0.001898526 0.071703921 0.176645786

So row 1 represents the only group that is "significantly different at the conventional level" from the expectations based on the overall sample collection. I also seem to remember that there is a function named CrossTable (in a package whose name I'm forgetting) that will deliver a SAS-style tabulation of row and column chi-squared statistics.

David.

Many thanks for your help.
>>
>> My best
>>
