[BioC] testing triangular setup for finding instances of "all are different"
Gordon K Smyth
smyth at wehi.EDU.AU
Wed Oct 14 00:23:46 CEST 2009
Your triangular problem is called intersection-union testing (IUT) in the
decideTests(method="nestedF") in limma (just add the extra argument to
Jim's example below) was designed to give a working man's solution to this
There are more formal solutions in the statistics literature, but none of
them gives a good FDR estimate. Most IUT tests are very conservative.
It's a hard problem.
> Date: Mon, 12 Oct 2009 11:28:11 -0400
> From: "James W. MacDonald" <jmacdon at med.umich.edu>
> Subject: Re: [BioC] testing triangular setup for finding instances of
> "all are different"
> To: wraff at igbmc.fr
> Cc: Bioconductor <bioconductor at stat.math.ethz.ch>
> Hi Wolfgang,
> Maybe I am missing your point entirely, but doesn't decideTests(fit1) do
> exactly what you want? You would have to modify your contrasts matrix
> (as the third column isn't a contrast anyway) to something like this:
> > cont <- cbind(contr.matr1[,-3], c(1,0,-1))
> > colnames(cont) <- c("a-b","b-c","a-c")
> > cont
> a-b b-c a-c
> groupsa 1 0 1
> groupsb -1 1 0
> groupsc 0 -1 -1
> > fit1 <- eBayes(contrasts.fit(lmFit(dat,design1),cont))
> > rslt <- decideTests(fit1)
> > vennDiagram(rslt)
> Wolfgang Raffelsberger wrote:
>> Dear List,
>> I'm trying to find out the best procedure to treat the following
>> "triangular" experimental plan.
>> Three groups of samples (prepared as triplicates and run on 3+3+3=9
>> microarrays) are to be compared in the way where those elements/genes
>> that are different in each of the three groups should be identified (ie,
>> (av1 != av2) & (av1 != av3) & (av2 != av3) )
>> If I run an ANOVA the F value won't tell me if one or two groups are
>> different. From the literature I see that people then like to resolves
>> this again as multiple pair-wise comparisons to find out which
>> group-means are actually different ....
>> Of course the simples way might be to run three (independent) pair-wise
>> comparisons (HoA: av1 = av2; HoB: av1 = av3; HoC: av2 = av3). However,
>> when looking at the resulting p-values I'm not taking in account the
>> extra multiplicity of testing. (Of course, before testing I'd check the
>> data by QC and run some filtering of the data). A work-around might be
>> to append all p-values to a 3 * length(genes) vector that could be used
>> to estimate FDR or local fdr and finally search elements/genes where all
>> FDRs are very low (I suppose I'd have to use the max(FDR) for a given
>> On this list I've seen a post for a somehow similar (?) case ("Re:
>> [BioC] result of linear model", 12 june 2009) suggesting to either look
>> at a "p.value/ratio threshold in limma", but I don't think this would
>> address my questions satisfyingly. Looking at the ratio of only 2
>> pair-wise comparisons it might happen that (av1 != av2) and (av1 !=
>> av3), but if (av2 = av3) the conclusion that all three are different may
>> be wrong ...
>> So, I'm asking myself if there are more direct and elegant approaches to
>> this problem. I remember that in related fields a constellation called
>> "trio" is somehow popular, and I'm sure others may have already thought
>> about solutions for such a setup.
>> Finally, the questions remains if it is possible to develop one (single)
>> FDR value for estimating that a given gene might be different in each of
>> the three groups.
>> Any hints/suggestions ?
>> Thank's in advance,
>> Wolfgang Raffelsberger
>> # Here an example to illustrate ...
>> dat <- matrix(runif(180),nc=9)
>> rownames(dat) <- paste("g",1:nrow(dat),sep="_")
>> dat[1,] <- dat[1,] + rep(1:3,each=3) # true positive
>> dat[2,] <- dat[2,] + 2*rep(1:3,each=3) # another true positive
>> groups <- gl(3,3,labels=letters[1:3]) # organize the data in 3 groups
>> # basic testing in limma (emprical Bayes)
>> (design1 <- model.matrix( ~ -1 + groups ))
>> (contr.matr1 <- makeContrasts(paste(colnames(design1)[1:2],collapse="-"),
>> fit1 <- eBayes(contrasts.fit(lmFit(dat,design1),contr.matr1))
>> topTable(fit1, coef=1,number=4) # the best of 1st pairwise
>> # append p-values for correcting them all together and then rearrange in
>> origial setup
>> combFDR <- matrix(p.adjust(as.numeric(fit1$p.value),method="BY"),
>> nc=3) # or your favorite FDR estimation method
>> rownames(combFDR) <- rownames(dat)
>> combFDR[ order(apply(combFDR,1,min))[1:10],] # the best FDR results
>> # and then the 'old' question of finding/defining a suitable threshold
>> for the FDR values remains to be addressed ...
>> # plot the best one ...
>> stripplot(dat[ order(apply(combFDR,1,min)),],groups=groups)
>> # for completeness :
>> > sessionInfo()
>> R version 2.9.1 (2009-06-26)
>> attached base packages:
>>  stats graphics grDevices utils datasets methods base
>> other attached packages:
>>  lattice_0.17-25 limma_2.18.2
>> loaded via a namespace (and not attached):
>>  grid_2.9.1 tools_2.9.1
>> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
>> Wolfgang Raffelsberger, PhD
>> Laboratoire de BioInformatique et G?nomique Int?gratives
>> CNRS UMR7104, IGBMC, 1 rue Laurent Fries, 67404 Illkirch Strasbourg,
>> Tel (+33) 388 65 3300 Fax (+33) 388 65 3276
>> wolfgang.raffelsberger (at) igbmc.fr
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> Search the archives:
> James W. MacDonald, M.S.
> Douglas Lab
> University of Michigan
> Department of Human Genetics
> 5912 Buhl
> 1241 E. Catherine St.
> Ann Arbor MI 48109-5618
More information about the Bioconductor