[R] Tukey HSD plot with lines indicating (non-)significance

Karl Ove Hufthammer karl at huftis.org
Tue Jan 15 12:46:23 CET 2013


Den 2013-01-14 19:58 skreiv Richard M. Heiberger:
> Please look at the MMC (Mean-mean Multiple Comparisons) plot in the 
> HH package.
> It displays both the means and the differences.
>
> install.packages("HH") ## if you don't already have it.
> library(HH)
> ?MMC

I have now coded a very quick-and-dirty solution for my line graph.
Example follows. I use the results of the 'cld' function in multcomp
to calculcate which groups are significantly different from each other.

Important note: The LetterMatrix object contains the needed
data, but the rows are not in the same order as the original levels,
and unfortunately the row names do *not* match the level names
(spaces seems to be stripped -- I don't know why). That's the reason
I remove the spaces in the example below.

Note: The levels should preferably be ordered by the group means,
but the function works (modulo any bugs) even if they're not.
Dotted lines are then used to indicate significant differences.

Important note: There are bound to be bugs. Do check the results
before using the resulting graph.


# Calculate non-significance lines for Tukey HSD
tukeyHSDLines=function(l, removeEndLines=TRUE, 
removeSingleGroups=FALSE) {

   # Calculate Tukey HSD values
   library(multcomp)
   l.glht=glht(l, linfct = mcp(trt = "Tukey"))

   # Calculcate a compact letter display (CLD)
   l.cld=cld(l.glht)
   lmat=l.cld$mcletters$LetterMatrix[levels(d$trt),]

   # Calculate the non-significance lines that should be
   # drawn, based on the CLD data
   calc.lines=function(x) {
     r=rle(x)
     lend=cumsum(r$lengths)            # Line end index
     lstart=c(1,lend[-length(lend)]+1) # Line start index
     d2=data.frame(lstart=lstart-.4, lend=lend+.4, nodraw=!r$values)

     if( removeEndLines ) {
       if(d2$nodraw[1]) d2=d2[-1,]               # Remove 'leading' 
dotted lines
       if(d2$nodraw[nrow(d2)]) d2=d2[-nrow(d2),] # Remove 'trailing' 
dotted lines
     }
     d2
   }

   linlist=apply(lmat, 2, calc.lines)
   lindat=do.call(rbind, linlist)
   lindat$y=rep(seq_along(linlist), times=sapply(linlist, nrow))

   if (removeSingleGroups) {
     lindat=subset(lindat, !(y %in% which(tabulate(lindat$y)==1)))
     lindat$y=as.numeric(factor(lindat$y))
   }
lindat
}


# Example:
library(DAAG)
d=rice                                     # The dataset to use
levels(d$trt)=gsub(" ", "", levels(d$trt)) # Remove spaces from level 
names
# d$trt=reorder(d$trt, -d$ShootDryMass)    # Graph looks better with 
reordered factor
l=aov(ShootDryMass ~ trt, data=d)          # Fit a one-way ANOVA

lindat=tukeyHSDLines(l)
lindat$y=150-lindat$y*3

library(ggplot2)
ggplot(d, aes(x=trt, y=ShootDryMass)) + geom_boxplot(outlier.colour=NA) 
+
   geom_jitter(col="red", size=3, position=position_jitter(width=.1)) +
   geom_segment(aes(x=lstart, xend=lend, y=y, yend=y, linetype=nodraw), 
lindat)

-- 
Karl Ove Hufthammer



More information about the R-help mailing list