[BioC] Limma: how to combine duplicateCorrelation, dyeeffect and arrayweights?

Wed Nov 7 17:21:09 CET 2007

Hi,

I am quite insecure if some parts of the analyis I did in Limma are really
correct and I would highly appreciate if someone could take a look and
give advice. My main concern is that I may not use the
duplicatecorrelation, dyeeffect,arrayweights and spotweights correctly.

The arrays I use are printed in duplicates with a spacing of 15000 (so
30000 features in total), and I did the imageprocessing in GenePixPro6.1.
Thereby I flagged all spots close to backgroundsignal and with a rgn r2
<0.5 bad, and only 30% of my data remain unflagged.

And this is what I did using Limma:

> targets=readTargets("Targets_basicSat.txt")
> targets
   SlideNumber          FileName Cy3 Cy5
1            1 3096_basicSat.gpr ref   A
2            2 3079_basicSat.gpr   A ref
3            3 3089_basicSat.gpr ref   A
4            4 3081_basicSat.gpr   A ref
5            5 3071_basicSat.gpr ref   B
6            6 3082_basicSat.gpr   B ref
7            7 3085_basicSat.gpr ref   B
8            8 8268_basicSat.gpr   B ref
9            9 7829_basicSat.gpr ref   C
10          10 3086_basicSat.gpr   C ref
11          11 7823_basicSat.gpr ref   C
12          12 7826_basicSat.gpr   C ref
13          13 3090_basicSat.gpr ref   D
14          14 3091_basicSat.gpr   D ref
15          15 3092_basicSat.gpr ref   D
16          16 7827_basicSat.gpr   D ref

Every other slide is a dyeswapped technical replicate and per "group"
(A,B,C,D) there are 2 biological replicates.

> K=read.maimages(targets$FileName, source="genepix.median",
wt.fun=wtflags(0))
> types=readSpotTypes("SpottypesGAPDH.txt")
> Status=controlStatus(types, K)
> K$genes$Status=Status
> K3=backgroundCorrect(K, method=”normexp”, offset=50)
> K3=normalizeWithinArrays(K3, method="median")
> K3a=normalizeBetweenArrays(K3, method="quantile")
> design=modelMatrix(targets, ref="ref")
> design
      A  B  C  D
 [1,]  1  0  0  0
 [2,] -1  0  0  0
 [3,]  1  0  0  0
 [4,] -1  0  0  0
 [5,]  0  1  0  0
 [6,]  0 -1  0  0
 [7,]  0  1  0  0
 [8,]  0 -1  0  0
 [9,]  0  0  1  0
[10,]  0  0 -1  0
[11,]  0  0  1  0
[12,]  0  0 -1  0
[13,]  0  0  0  1
[14,]  0  0  0 -1
[15,]  0  0  0  1
[16,]  0  0  0 -1

Since I am expecting a non-negligible dyeeffect I created an other
designmatrix and the following contrastMatrix:

>design1=cbind(DyeEffect=1, design)
>design.cont=makeContrasts("A", “B“, “A-B", levels=design1)

Next I estimate the correlation of within-array-duplicates:

>cor=duplicateCorrelation(K3b, design=design1, ndups=2, spacing=15000,
weights=K3b$weights)

My first question is: is it correct to use here the designmatrix for the
dyeeffect (design1 in this case)?

When fitting the linear model, I also want to use arrayweights, combined
with spotweights. So I gave following commands:

> aw=arrayWeights(K3b, design=design1)
> w=matvec(K3b$weights, aw)

Again the question: is it correct to use here the "design1"-matrix
considering the dyeeffect?

Then I fit the linear model:

>fit=lmFit(K3b, design=design1, ndups=2, spacing=15000, cor=cor$consensus,
weights=w)
>fit1=contrasts.fit(fit, design.cont)
>eb=eBayes(fit1)

Another thing I am worried about is that taking into account the dyeeffect
plus arrayweights plus spotweights might be a bit "too much"? Like in a
way "overtransforming" my data? Especially since approx 70% of my data
have a spotweight of zero. Might it be better to use the spotweight of 0,1
for bad spots, so that I do not loose the data completely?

My apologies for this long email, I tried hard to find out the answers for
myself reading the limmaguide and lots of other documents I found
googleing, but still feel quite "stuck" in my analysis process.

Thanks very much for any kind of help in advance!
Best regards
Dorthe

-- 
Dorthe Belgardt
Institute of Basic Medical Sciences
Department of Physiology
P.O. Box 1103 Blindern
0317 Oslo
Norway