[BioC] testing triangular setup for finding instances of "all are different"
wraff at igbmc.fr
Mon Oct 12 13:45:02 CEST 2009
I'm trying to find out the best procedure to treat the following
"triangular" experimental plan.
Three groups of samples (prepared as triplicates and run on 3+3+3=9
microarrays) are to be compared in the way where those elements/genes
that are different in each of the three groups should be identified (ie,
(av1 != av2) & (av1 != av3) & (av2 != av3) )
If I run an ANOVA the F value won't tell me if one or two groups are
different. From the literature I see that people then like to resolves
this again as multiple pair-wise comparisons to find out which
group-means are actually different ....
Of course the simples way might be to run three (independent) pair-wise
comparisons (HoA: av1 = av2; HoB: av1 = av3; HoC: av2 = av3).
However, when looking at the resulting p-values I'm not taking in
account the extra multiplicity of testing. (Of course, before testing
I'd check the data by QC and run some filtering of the data). A
work-around might be to append all p-values to a 3 * length(genes)
vector that could be used to estimate FDR or local fdr and finally
search elements/genes where all FDRs are very low (I suppose I'd have to
use the max(FDR) for a given element/gene)
On this list I've seen a post for a somehow similar (?) case ("Re:
[BioC] result of linear model", 12 june 2009) suggesting to either look
at a "p.value/ratio threshold in limma", but I don't think this would
address my questions satisfyingly. Looking at the ratio of only 2
pair-wise comparisons it might happen that (av1 != av2) and (av1 !=
av3), but if (av2 = av3) the conclusion that all three are different may
be wrong ...
So, I'm asking myself if there are more direct and elegant approaches to
this problem. I remember that in related fields a constellation called
"trio" is somehow popular, and I'm sure others may have already thought
about solutions for such a setup.
Finally, the questions remains if it is possible to develop one (single)
FDR value for estimating that a given gene might be different in each of
the three groups.
Any hints/suggestions ?
Thank's in advance,
# Here an example to illustrate ...
dat <- matrix(runif(180),nc=9)
rownames(dat) <- paste("g",1:nrow(dat),sep="_")
dat[1,] <- dat[1,] + rep(1:3,each=3) # true positive
dat[2,] <- dat[2,] + 2*rep(1:3,each=3) # another true positive
groups <- gl(3,3,labels=letters[1:3]) # organize the data in 3 groups
# basic testing in limma (emprical Bayes)
(design1 <- model.matrix( ~ -1 + groups ))
(contr.matr1 <- makeContrasts(paste(colnames(design1)[1:2],collapse="-"),
fit1 <- eBayes(contrasts.fit(lmFit(dat,design1),contr.matr1))
topTable(fit1, coef=1,number=4) # the best of 1st pairwise
# append p-values for correcting them all together and then rearrange
in origial setup
combFDR <- matrix(p.adjust(as.numeric(fit1$p.value),method="BY"),
nc=3) # or your favorite FDR estimation method
rownames(combFDR) <- rownames(dat)
combFDR[ order(apply(combFDR,1,min))[1:10],] # the best FDR results
# and then the 'old' question of finding/defining a suitable threshold
for the FDR values remains to be addressed ...
# plot the best one ...
# for completeness :
R version 2.9.1 (2009-06-26)
attached base packages:
 stats graphics grDevices utils datasets methods base
other attached packages:
 lattice_0.17-25 limma_2.18.2
loaded via a namespace (and not attached):
 grid_2.9.1 tools_2.9.1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Wolfgang Raffelsberger, PhD
Laboratoire de BioInformatique et Génomique Intégratives
CNRS UMR7104, IGBMC,
1 rue Laurent Fries, 67404 Illkirch Strasbourg, France
Tel (+33) 388 65 3300 Fax (+33) 388 65 3276
wolfgang.raffelsberger (at) igbmc.fr
More information about the Bioconductor