[BioC] To many differentially expressed genes produced by LIMMA and dye-effect question

Mon Jan 7 01:51:43 CET 2008

Dear List
I hope somebody may give me an explanation of strange phenomena

I need some explanation about my results obtained by LIMMA analysis of 
my microarray data.
I have some experience with limma and some other packages from Bioconductor
however I must say that I'm rather biologist than bioinformatician.

Briefly description of my experiment:
I'm comparing two groups of patients
with different forms of one disease (let's say group A and group B)
RNA from 18 patients (11 from A and 7 from B) were hybridized to 36 44K 
Agilent human microarrays
All of microarrays were performed against common reference and each one 
had a dye-swap hybridization.

targets:
    filename          sampleID      Cy3      Cy5
1   A1.ST.txt      A1                Ref        A1
2   A1.DS.txt      A1                A1        Ref
....
21 A11.ST.txt   A11               Ref         A11
22 A11.DS.txt   A11               A11       Ref
23 B1.ST.txt      B1                 Ref        B1
24 B1.DS.txt     B1                 B1         Ref
....
35 B7.ST.txt      B7                 Ref        B7
36 B7.DS.txt     B7                 B7         Ref

after importing the data, removing all types of control spots from dataset,
performing "loess" within array normalization like and "scale" between 
normalization:
 >MA.offset <- normalizeWithinArrays(RG, method="loess", 
bc.method="normexp", offset = 50)
 >MA.scale <- normalizeBetweenArrays(MA.offset, method="scale")

 >dim(MA.scale)
[1] 38133   36

Design matrix is rather simple in my case:
 >design <- modelMatrix(targets, ref="Ref")
and to account for the possible dye-effect include:
 >design <-cbind(Dye=1, design)

and then linear model:
 >fit1 <-lmFit(MA.scale,design)
 >cont.matrix<-makeContrasts(Dye,
+                                             A=(A1+....+A11)/11,
+                                             B=(B1+...+B7)/7,
+                                             
AvsB=((A1+....+A11)/11-(B1+...+B7)/7),
+                                             design)
 >fit2<-contrasts.fit(fit1,cont.matrix)
 >fit3<-eBayes(fit2)

So far so good but then:

 >d<-decideTests(fit3, adjust.method="fdr", p.value=0.001)
 >summary(d)
       Dye    A         B           AvsB
-1   3026  14909  14530   8161
0   32497  6720    7881    21544
1     2610  16504  15722  8428 
gives me terrible amount of regulations and dye-effects:

even with incredibly stricken adjustments and p.value cutoff I'm getting 
ennormous amount of regulation:
 >d<-decideTests(fit3, adjust.method="bonferroni", p.value=0.0001)
 >summary(d)
       Dye    A         B           AvsB
-1   320  12390  11606   2584
0   37496  12633   14242  31896
1    317  13110  12285  3653

Even if to play with cut-off on fold change the amount of the data with 
fold change above 1.5 is 1000 genes for up-regulation...:(

In general it can't be true as far as I understand biological problem 
and statistical surrounding of the data,
It's virtualy not possible that different human patients could produce 
such a similar expression profile to each other (I mean within one group). 
Form other hand it is violating major assumption of microarray 
dif.expression testing that only small proportion of the genes could be 
differentially expressed.

May anybody give me a hint on what I'm doing not correct in data treatment

I tried to fit the model slightly different way and got completely 
ironic results which I can't interpret myself at all,
so I'm lost afterwards

 >biolrep <- c(1,1,2,2,3,3,....,18,18)
 >corfit<-duplicateCorrelation(MA.scale, ndups=1, block=biolrep)
 >corfit$consensus
[1] -0.9645838
which is not bad as I do understand that there is nice negative 
correlation between dye-swaps
however there is nowhere stated that all this arrays are belonging to 
two different groups (A and B),
(probably there is no matter for this function as it calculates the 
correlation between each pair only)

Thanks a lot for any help and advise.

 > sessionInfo()
R version 2.6.1 (2007-11-26)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
States.1252;LC_MONETARY=English_United 
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base    

other attached packages:
[1] statmod_1.3.1 limma_2.12.0
 >

Which afterwards gives a really simple design:

-- 
V. Krasikov
Swammerdam Institute for Life Sciences
Plant Pathology
University of Amsterdam
Kruislaan 318
1098SM Amsterdam

Telephone: +31(0)20 5257839
Telefax: +31(0)20 5257934
E-mail: krasikov at science.uva.nl