[BioC] Limma-design matrix for technical replication

Gordon K Smyth smyth at wehi.EDU.AU
Wed Aug 13 04:52:48 CEST 2008

```Dear Katerina,

Well, you're starting your microarray experience with a microarray design
which is quite subtle.

The design and contrast that you give is sensible (it's recommended in the
limma User's Guide for this sort of design), but you need to understand
what it's testing.  You're testing for genes which are differentially
expressed between these three D cells vs these three N cells, relative to
technical variation.  This approach also allows you to compare the three
biological replicates if you want.

However, you might be wanting to find genes which are statistically
significant relative to biological variation, and this is harder.  In
principle, the three biological replicates can be treated as blocks, but
limma isn't smart enough to handle the dye-swaps and the blocking at the
same time.

With your experiment, you could do it like this. First create a vector

dyeswap <- c(1,1,-1,-1,1,1,-1,-1,1,1,-1,-1)

Then unswap the M-values (I don't usually recommend this):

MA2 <- MA
MA2\$M <- t(t(MA\$M) * dyeswap)

Your design matrix is now very simple with all M-values lined up:

design <- cbind(Dye=dyeswap,DvsN=1)

Note I've included probe-specific dye-effects here, which you may as well.

Then estimate the correlation within biological replicates:

biolrep <- c(1,1,1,1,2,2,2,2,3,3,3,3)
dupfit <- duplicateCorrelation(MA2,design,block=biolrep)
dupfit\$consensus

Check the correlation is positive. Then

fit <- lmFit(MA2,design,block=biolrep,correlation=dupfit\$consensus)
fit <- eBayes(fit)
topTable(fit,coef=2)

All the best
Gordon

> Date: Mon, 11 Aug 2008 17:06:02 +0200
> From: Kate?ina Kepkov? <kepkova at iapg.cas.cz>
> Subject: [BioC] Limma-design matrix for technical replication
> To: <bioconductor at stat.math.ethz.ch>
>
> Dear all,
> As a complete newbie to microarrays, I am trying to analyze experiment with
> following design: Two samples (differentiated versus undifferentiated cells)
> were compared directly on two-color oligo array, with 3 biological
> replicates (different cell sources) and 4 technical replicates (arrays) per
> biological replicate (12 arrays altogether). In every set of technical
> replicates two arrays are dye-swap. I am not sure how to handle the
> technical and biological replication when trying to fit linear model. We are
> interested just in overall comparison differentiated versus undifferentiated
> cells.
> I have arrived to following setup:
> Targets file is:
> SlideNumber	FileName	Cy3	Cy5
> 1	1.gpr	N1	D1
> 2	2.gpr	N1	D1
> 3	3.gpr	D1	N1
> 4	4.gpr	D1	N1
> 5	5.gpr	N2	D2
> 6	6.gpr	N2	D2
> 7	7.gpr	D2	N2
> 8	8.gpr	D2	N2
> 9	9.gpr	N3	D3
> 10	10.gpr	N3	D3
> 11	11.gpr	D3	N3
> 12	12.gpr	D3	N3
>
> Where N means undifferentiated and D differentiated cells and 1-3 are
> biological replicates.
>
> Is the following design correct one? Or is there a better way to obtain
> relevant information?
> Is this extensible for more/less biological replicates?
>
> design <- cbind(D1vsN1 = c(1,1,-1,-1,0,0,0,0,0,0,0,0), D2vsN2 =
> c(0,0,0,0,1,1,-1,-1,0,0,0,0), D3vsN3 = c(0,0,0,0,0,0,0,0,1,1,-1,-1))
> fit <- lmFit(MA, design)
> cont.matrix <- makeContrasts(DvsN = (D1vsN1 + D2vsN2 + D3vsN3)/3, levels =
> design)
> fit2 <- contrasts.fit(fit, cont.matrix)
> fit2 <- eBayes(fit2)
>
>
> Sorry if I am asking something obvious and thank you in advance for your
> help.
>
> Best regards,
> Katerina
>
> ---------------------------------------------------------------------
> Katerina Kepkova
> Laboratory of developmental biology
> Department of Reproductive and Developmental Biology
> Institute of Animal Physiology and Genetics of the AS  CR, v.v.i.
> Rumburska 89, Libechov 277 21
> Czech Republic
> tel:     +420 315 639 534
> fax:     +420 315 639 510
> e-mail: kepkova at iapg.cas.cz

```