[BioC] To many differentially expressed genes produced by LIMMA and dye-effect question

Vladimir Krasikov krasikov at science.uva.nl
Tue Jan 8 01:40:35 CET 2008


Dear Naomi

Thanks a lot for the fast reply.

Naomi Altman wrote:
> Dear Vladimir,.
> I am a bit confused by the output because, while the code seems to 
> indicate that the analysis is on the log2 scale, the DecideTests 
> output does not.
>
> If the data are on the natural scale, the distributional assumptions 
> of limma are incorrect.  The denominator adjustment will pull the 
> estimated variance down too much, leading to large values of the test 
> statistic and high significance values.
>
> To check the scale of the data, just type:
>
> MA.scale[1:5,]

The scale of the data should be ordinary log2 scale of limma as I do 
everything according to the limma vignette and help-pages of the 
particular functions.
But anyway I checked the M values and they are indeed in log-scale.

> which prints the first 5 genes.  If the data are on the log scale, all 
> values should be beween 4 and 16.
>
> Also, in a dye-swap design, you really do need to consider the blocks 
> (see the limma manual).  Presumably the technical replicates are less 
> variable than the biological replicates.  Also, the additional 
> replication inflates the degrees of freedom.  Both of these things 
> also pull the estimated variance down.
I would need some help here...
I would like to use the existing power of dye-swaps technical replicates 
to get as much info as possible
Here are some variants which I've tried and all of them are producing 
strange staff:

1.
(the one I already described earlier)
 >design <- modelMatrix(targets, ref="Ref")
 >design <-cbind(Dye=1, design)
 >fit1 <-lmFit(MA.scale,design)
 >cont.matrix<-makeContrasts(Dye, A=(A1+....+A11)/11, B=(B1+...+B7)/7, 
AvsB=((A1+....+A11)/11-(B1+...+B7)/7), design)
 >fit2<-contrasts.fit(fit1,cont.matrix)
 >fit3<-eBayes(fit2)
 >d<-decideTests(fit3, adjust.method="fdr", p.value=0.001, lfc=log2(1.5))
 > summary(d)
     Dye      A           B            AvsB
-1     0     7012      7045         448
0  38127 23331    23479      36656
1      6      7790      7609         1029

amount of genes without cutoff on fold change is enormous - more than 
16.000 !!!

2a. Blocked dye-swaps
 >design <- modelMatrix(targets, ref="Ref")
 >design <-cbind(Dye=1, design)
 >biolrep<-c(1,1,2,2,3,3,...,18,18)
 >corfit <- duplicateCorrelation(MA.scale, design, ndups = 1, block=biolrep)
 >corfit$consensus.correlation
[1] NaN
 > What is wrong here?

2b.
 >design <- modelMatrix(targets, ref="Ref")
 >biolrep<-c(1,1,2,2,3,3,...,18,18)
 >corfit <- duplicateCorrelation(MA.scale, design, ndups = 1, block=biolrep)
 >corfit$consensus.correlation
[1] NaN
 > What is wrong here?

2c.
 >corfit <- duplicateCorrelation(MA.scale, ndups = 1, block=biolrep)
 >corfit$consensus.correlation
[1] -0.9645838
# finally I've got good correlation  by excluding the design from  the 
function
 >design <- modelMatrix(targets, ref="Ref")
 >design <-cbind(Dye=1, design)
 >fit1 <- lmFit(MA.scale, design, block = biolrep, cor = corfit$consensus)
 >cont.matrix<-makeContrasts(Dye, A=(A1+....+A11)/11, B=(B1+...+B7)/7, 
AvsB=((A1+....+A11)/11-(B1+...+B7)/7), design)
 >fit2<-contrasts.fit(fit1,cont.matrix)
 >fit3<-eBayes(fit2)
 > d <- decideTests(fit3, adjust.method="fdr", p.value=0.01)
 > summary(d)
     Dye      A           B            AvsB
-1  4818     6360    7950       7
0  28704    25049   22021   38098
1   4611    6724      8162      28
 >
# Oops :) No regulation

# 2d. all the same as 2c but in lmFit() excluded block variable:
 >fit1 <- lmFit(MA.scale, design, cor = corfit$consensus)
 >cont.matrix<-makeContrasts(Dye, A=(A1+....+A11)/11, B=(B1+...+B7)/7, 
AvsB=((A1+....+A11)/11-(B1+...+B7)/7), design)
 >fit2<-contrasts.fit(fit1,cont.matrix)
 >fit3<-eBayes(fit2)
 > d <- decideTests(fit3, adjust.method="fdr", p.value=0.01)
 > summary(d)
     Dye    RA   SPA RAvsSPA
-1  4818 15172 15440   10159
0  28704  6369  5335   18074
1   4611 16592 17358    9900
 >
 > d <- decideTests(fit3, adjust.method="fdr", p.value=0.01, lfc=log2(1.5))
 > summary(d)
     Dye    RA   SPA RAvsSPA
-1     0  7013  7046     450
0  38127 23330 23477   36653
1      6  7790  7610    1030
 > The same amount of regulation as in variant 1
 >

Please may you explain me how to use blocks and duplicateCorrelation 
because I don't understand outcome
and which of the above scenarios is correct?

The question about enormous amount of statistically reliable regulated 
genes still worries me...
I can't explain it

I will appreciate any comments on this matter

Regards
Vladimir Krasikov
> --Naomi
>
> At 07:51 PM 1/6/2008, you wrote:
>> Dear List
>> I hope somebody may give me an explanation of strange phenomena
>>
>> I need some explanation about my results obtained by LIMMA analysis of
>> my microarray data.
>> I have some experience with limma and some other packages from 
>> Bioconductor
>> however I must say that I'm rather biologist than bioinformatician.
>>
>> Briefly description of my experiment:
>> I'm comparing two groups of patients
>> with different forms of one disease (let's say group A and group B)
>> RNA from 18 patients (11 from A and 7 from B) were hybridized to 36 44K
>> Agilent human microarrays
>> All of microarrays were performed against common reference and each one
>> had a dye-swap hybridization.
>>
>> targets:
>>     filename          sampleID      Cy3      Cy5
>> 1   A1.ST.txt      A1                Ref        A1
>> 2   A1.DS.txt      A1                A1        Ref
>> ....
>> 21 A11.ST.txt   A11               Ref         A11
>> 22 A11.DS.txt   A11               A11       Ref
>> 23 B1.ST.txt      B1                 Ref        B1
>> 24 B1.DS.txt     B1                 B1         Ref
>> ....
>> 35 B7.ST.txt      B7                 Ref        B7
>> 36 B7.DS.txt     B7                 B7         Ref
>>
>> after importing the data, removing all types of control spots from 
>> dataset,
>> performing "loess" within array normalization like and "scale" between
>> normalization:
>>  >MA.offset <- normalizeWithinArrays(RG, method="loess",
>> bc.method="normexp", offset = 50)
>>  >MA.scale <- normalizeBetweenArrays(MA.offset, method="scale")
>>
>>  >dim(MA.scale)
>> [1] 38133   36
>>
>> Design matrix is rather simple in my case:
>>  >design <- modelMatrix(targets, ref="Ref")
>> and to account for the possible dye-effect include:
>>  >design <-cbind(Dye=1, design)
>>
>> and then linear model:
>>  >fit1 <-lmFit(MA.scale,design)
>>  >cont.matrix<-makeContrasts(Dye,
>> +                                             A=(A1+....+A11)/11,
>> +                                             B=(B1+...+B7)/7,
>> +
>> AvsB=((A1+....+A11)/11-(B1+...+B7)/7),
>> +                                             design)
>>  >fit2<-contrasts.fit(fit1,cont.matrix)
>>  >fit3<-eBayes(fit2)
>>
>> So far so good but then:
>>
>>  >d<-decideTests(fit3, adjust.method="fdr", p.value=0.001)
>>  >summary(d)
>>        Dye    A         B           AvsB
>> -1   3026  14909  14530   8161
>> 0   32497  6720    7881    21544
>> 1     2610  16504  15722  8428
>> gives me terrible amount of regulations and dye-effects:
>>
>> even with incredibly stricken adjustments and p.value cutoff I'm getting
>> ennormous amount of regulation:
>>  >d<-decideTests(fit3, adjust.method="bonferroni", p.value=0.0001)
>>  >summary(d)
>>        Dye    A         B           AvsB
>> -1   320  12390  11606   2584
>> 0   37496  12633   14242  31896
>> 1    317  13110  12285  3653
>>
>> Even if to play with cut-off on fold change the amount of the data with
>> fold change above 1.5 is 1000 genes for up-regulation...:(
>>
>> In general it can't be true as far as I understand biological problem
>> and statistical surrounding of the data,
>> It's virtualy not possible that different human patients could produce
>> such a similar expression profile to each other (I mean within one 
>> group).
>> Form other hand it is violating major assumption of microarray
>> dif.expression testing that only small proportion of the genes could be
>> differentially expressed.
>>
>> May anybody give me a hint on what I'm doing not correct in data 
>> treatment
>>
>> I tried to fit the model slightly different way and got completely
>> ironic results which I can't interpret myself at all,
>> so I'm lost afterwards
>>
>>  >biolrep <- c(1,1,2,2,3,3,....,18,18)
>>  >corfit<-duplicateCorrelation(MA.scale, ndups=1, block=biolrep)
>>  >corfit$consensus
>> [1] -0.9645838
>> which is not bad as I do understand that there is nice negative
>> correlation between dye-swaps
>> however there is nowhere stated that all this arrays are belonging to
>> two different groups (A and B),
>> (probably there is no matter for this function as it calculates the
>> correlation between each pair only)
>>
>> Thanks a lot for any help and advise.
>>
>>  > sessionInfo()
>> R version 2.6.1 (2007-11-26)
>> i386-pc-mingw32
>>
>> locale:
>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
>> States.1252;LC_MONETARY=English_United
>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] statmod_1.3.1 limma_2.12.0
>>  >
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Which afterwards gives a really simple design:
>>
>>
>>
>>
>>
>>
>> -- 
>> V. Krasikov
>> Swammerdam Institute for Life Sciences
>> Plant Pathology
>> University of Amsterdam
>> Kruislaan 318
>> 1098SM Amsterdam
>>
>> Telephone: +31(0)20 5257839
>> Telefax: +31(0)20 5257934
>> E-mail: krasikov at science.uva.nl
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> Naomi S. Altman                                814-865-3791 (voice)
> Associate Professor
> Dept. of Statistics                              814-863-7114 (fax)
> Penn State University                         814-865-1348 (Statistics)
> University Park, PA 16802-2111
>


-- 
V. Krasikov
Swammerdam Institute for Life Sciences
Plant Pathology
University of Amsterdam
Kruislaan 318
1098SM Amsterdam

Telephone: +31(0)20 5257839
Telefax: +31(0)20 5257934
E-mail: krasikov at science.uva.nl



More information about the Bioconductor mailing list