# [R] nonparametric multiple sample comparison

Gabor Grothendieck ggrothendieck at myway.com
Sun Apr 25 19:36:38 CEST 2004

```I have not attempted to follow this in detail but a few comments
are interspersed within your discussion and the code:

<aolinto_r <at> bignet.com.br> writes:

:
: Hello all,
:
: Here goes one of my first functions.
:
: I want to make a nonparametric multiple sample comparison with unequal
sample
: sizes (see ZarÂ’s Biostatistical Analysis, 3rd. Ed., pg. 201 Example 10.11,
pg.
: 288 Example 11.10). In the real world, I want to compare samples of fish
: length captured with different fishing gears.
:
: After using the Kruskal-Wallis test I want to check the differences between
: each two samples.
:
: I wrote the function multcomp (see below) and itÂ’s working OK but I still
have
: some doubts. To use it one must have a two-column dataframe with groups and
: measurements.
:
: 1) In line 20 (Results <- data.frameÂ....) I create a dataframe to store the
final
: results. It has a fixed row number but it would be better to have a variable
: number, i.e., the number of combination of the groups taken 2 by 2 (4 groups
=
: 5 combinations, 6 groups = 15 combinations). Which function in R returns the
: number of combinations? I couldnÂ’t find it!

?choose

:
: 2) In lines 35 to 39 I print the result tables. How can I make a "clean"
: print, without row numbers and quotation marks?

?cat

:
: 3) Also IÂ’d like to calculate the critical values "q" and p-values but I
: couldnÂ’t find the distribution indicated in ZarÂ’s Table B.15
: (App107) "Critical Values of Q for Nonparametric Multiple Comparison
Testing".
: Is it possible to calculate these numbers in R?
:
: Thanks in advance for any help or comments to improve this function.
:
: Best regards,
:
: Antonio Olinto
: Sao Paulo Fisheries Institute
: Brazil
:
: multcomp <- function(DataSet) {
: dat.multcomp <- DataSet

You don't need to copy the DataSet.  If you try to modify any object
within a function then R makes a copy of it automatically anyways.

: names(dat.multcomp) <- c("VarCat","VarNum")
: attach(dat.multcomp)

attach is generally frowned upon.  Just call it a short name such as d so that
its easy to refer to.  (Alternately wrap references to data frame in a with.)
I have assumed the data frame is called d in what follows.

: dat.multcomp\$Rank <- rank(VarNum)
: attach(dat.multcomp)
: RankList <- aggregate(Rank,list(Rank=Rank),FUN=length)
: t <- length(RankList\$Rank)
: st <- 0
: for (i in 1:t) if (RankList[i,2]>1) st<-st+(RankList[i,2]^3-RankList[i,2])

The last two statements could be reduced to:

st <- sum(ifelse( RankList[,2]>1, RankList[,2]^3-RankList[,2], 0 )

: LevCat <- levels(dat.multcomp\$VarCat)
: NLevCat <- aggregate(VarCat,list(LevCat=VarCat),FUN=length)
: RLevCat <- aggregate(Rank,list(LevCat=VarCat),FUN=sum)
: MLevCat <- aggregate(Rank,list(LevCat=VarCat),FUN=mean)
: SampleSummary <- data.frame(LevCat,RLevCat[,2],NLevCat[,2],MLevCat[,2])
: names(SampleSummary)<-c("Samples","RSum","N","RMean")

If I have followed this correctly then the above 6 lines can be reduced to:

SampleSummary <- do.call( "rbind", by(d\$Rank, d\$VarCat,
function(x) c(Rsum=sum(x),N=length(x),RMean=mean(x))))

although that produces a matrix rather than a data frame and puts the
levels as the row rather than a column so you will have to adjust
what follows.

: SampleSummary <- SampleSummary[order(SampleSummary\$RMean,decreasing=T),]
: NCat <- length(LevCat)
: N <- length(dat.multcomp\$VarCat)
: Results <- data.frame(rep(NA,6),rep(NA,6),rep(NA,6),rep(NA,6))
: names(Results) <- c("Comparison","Difference","SE","Q")
: l <- 1
: for (i in 1:(NCat-1)) {
:    for (j in NCat:(i+1)) {
:    SE <- sqrt(((N*(N+1)/12)-(st/(12*(N-1))))*((1/SampleSummary[i,3])+
: (1/SampleSummary[j,3])))
:    Dif <- SampleSummary[i,4]-SampleSummary[j,4]
:    Q=Dif/SE
: Results[l,1] <- paste(SampleSummary[i,1],"vs",SampleSummary[j,1])
: Results[l,2] <- round(Dif,4)
: Results[l,3] <- round(SE,4)
: Results[l,4] <- round(Q,4)
: l <-l+1
:    }
: }
: print("Sample summary ranked by mean ranks")
: print(SampleSummary)
: print("")
: print("Table of multiple comparisons")
: print(Results)
: }
:
: ===========
:
: Data example taken from Zar (Biostatiscical Analysis):
:
: dat.limno <- data.frame(scan(what = list (Pound=" ", pH=0), sep=" "))
: 1 7.68
: 1 7.69
: 1 7.7
: 1 7.7
: 1 7.72
: 1 7.73
: 1 7.73
: 1 7.76
: 2 7.71
: 2 7.73
: 2 7.74
: 2 7.74
: 2 7.78
: 2 7.78
: 2 7.8
: 2 7.81
: 3 7.74
: 3 7.75
: 3 7.77
: 3 7.78
: 3 7.8
: 3 7.81
: 3 7.84
: 4 7.71
: 4 7.71
: 4 7.74
: 4 7.79
: 4 7.81
: 4 7.85
: 4 7.87
: 4 7.91
:
: kruskal.test(pH~Pound)
:
:         Kruskal-Wallis rank sum test
:
: data
: Kruskal-Wallis chi-squared = 11.9435, df = 3, p-value = 0.007579
:
: multcomp(dat.limno)
:
:  "Sample summary ranked by mean ranks"
:   Samples  RSum N    RMean
: 3       3 145.0 7 20.71429
: 4       4 163.5 8 20.43750
: 2       2 132.5 8 16.56250
: 1       1  55.0 8  6.87500
:  ""
:  "Table of multiple comparisons"
:   Comparison Difference     SE      Q
: 1     3 vs 1    13.8393 4.6923 2.9493
: 2     3 vs 2     4.1518 4.6923 0.8848
: 3     3 vs 4     0.2768 4.6923 0.0590
: 4     4 vs 1    13.5625 4.5332 2.9918
: 5     4 vs 2     3.8750 4.5332 0.8548
: 6     2 vs 1     9.6875 4.5332 2.1370
:
:
: -------------------------------------------------
: WebMail Bignet - O seu provedor do litoral
: www.bignet.com.br
:
: ______________________________________________
: R-help <at> stat.math.ethz.ch mailing list
: https://www.stat.math.ethz.ch/mailman/listinfo/r-help