[BioC] problems about cDNA vs genomic arrays normalization

Mon Nov 20 19:37:37 CET 2006

Hi Yanju,

>After reading your explanation, I still have 2 puzzles.
>1. Before I also applied normalizeWithinArrays() method to this 
>dataset.  Do you think it is correct or necessary in my case?

No, you should not do normalizeWithinArrays! This assumes that most genes 
are not changing expression between the two samples on one array, and in 
your case you have every reason to expect that the 'expression' levels of 
genomic DNA will not be anything like cDNA from your experimental groups, 
as you mentioned in your first post.

>2. You said "For the statistical analysis, you use the R values 
>directly."  But after normalizeBetweenArrays(), then a MAList was 
>generated. It consisted of M, A value etc but not R value (red channel 
>intensity).

It's easy to convert between RGLists, which contain R and G values, and 
MALists, which have M and A values. See 'RG.MA' and 'MA.RG' - they're 
explained at the end of the details section of the help page for 
'normalizeWithinArrays'. Another thing - Are you doing a background 
correction first? Because if you don't, and do 'normalizeWithinArrays' or 
'normalizeBetweenArrays' on a RGList that still has the Rb and Gb items in 
it, a simple background subtraction will be done automatically. This is not 
necessarily a good thing IMO because a negative R or G values in either 
channel will cause the M & A values to be lost, so that you cannot recreate 
the R & G values again. Let's say for simplicity sake that RG is your 
original RGList before any pre-processing, and the genomic DNA is in the 
Green channel on each slide. I would do something like this:

RG.nobg <- backgroundCorrect(RG, method="none")
         # or maybe pick "half" to avoid neg. values

MA.nobg.Gquant <- normalizeBetweenArrays(RG.nobg,method="Gquantile")
         # do a quantile normalization on the G / genomic values

RG.nobg.Gquant <- RG.MA(MA.nobg.Gquant)
         # convert the MAList back to a RGList

MA.fake <- MA.nobg.Gquant
         # create a MAList to manipulate

MA.fake$M <- log2(RG.nobg.Gquant$R)
         # replace the M values with the log2(R) values so you can do the 
analysis on them

You can now proceed with the analysis as if you had Affymetrix-type data. 
You'll have to change your design matrix accordingly (no -1s!), but the 
rest of your analysis should be the same as you have below. It gets a bit 
more complicated if the genomic DNA is not all in the G channel - after the 
background correction you have to switch the R & G values for the arrays 
that have genomic DNA in the R channel, then account for the dye effect by 
fitting a block effect using 'duplicateCorrelation'. It's very similar to 
the Technical Replication/Randomized Block section of the limma vignette.

Good luck,
Jenny

>And then I fited my MAlist to the linear model by using:
>    design<-modelMatrix(targets, ref="gDNA")
>    fit<-lmFit(ma.paq,design)
>I think all my following analysis are based on the M value. Finally, I 
>used eBayes function to summary statistics in order to detect the most 
>differently expressed genes.
>    cont.matrix<-makeContrasts( WTvsMU=wt-mu,levels=design)
>    fit2<-contrasts.fit(fit,cont.matrix)
>    fit2<-eBayes(fit2)
>So, I have no idea how to use R values directly. Was my codes wrong?
>I was not quite sure about my code or method, because at the end I gave 
>some uninterpretable results which did not meet the expectation of the 
>biologists. That is why now I am recheck my code and methods.  Thank you 
>again and also Wolfgang for your kindly help.
>
>Kind regards,
>Yanju
>
>
>
>Jenny Drnevich wrote:
>
>>Hi Yanju,
>>
>>I have just been working with a couple of data sets similar to yours 
>>where a) one channel has the same reference and b) the assumptions of few 
>>differences between sample and reference are not necessarily upheld. In 
>>these cases I have been using the Rquantile or Gquantile methods of 
>>normalizeBetweenArrays() in limma. These methods will do a quantile 
>>normalization on the R or G channel indicated so they have the "same 
>>empirical distribution across arrays, leaving the M-values (log-ratios) 
>>unchanged." Say your reference is in the green channel - doing a 
>>Gquantile normalization would force all the reference values to have the 
>>same distribution, and then adjust the R channel values accordingly. For 
>>the statistical analysis, you use the R values directly because if you 
>>use the M values, it would be like you never did the normalization. If 
>>the reference is not all in the same channel, I manipulate the RGList so 
>>that they are all in the same channel, but then I also include 'dye' as a 
>>batch effect in the model.
>>
>>HTH,
>>Jenny
>>
>>At 10:32 AM 11/20/2006, yanju wrote:
>>
>>>Dear all,
>>>
>>>I have got a microarray dataset derived from common reference design.
>>>The common reference is gemoic DNA.  In normal normalization, we assume
>>>that  large fraction of genes is not differently expressed, then the
>>>adjustment strategies are used to let the log-ratios have a median(mean)
>>>of 0. But in my case, every spot would have the same observed signal in
>>>the genomic channel while the signals in the cDNA channel vary greatly.
>>>Therefore, the strategies that i just mentioned are not suitable. I was
>>>wondering how to normalize this kinds of data? Is that any packages or
>>>functions existed already? Expecting your reply.
>>>
>>>Regards,
>>>Yanju
>>>
>>>_______________________________________________
>>>Bioconductor mailing list
>>>Bioconductor at stat.math.ethz.ch
>>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>Search the archives: 
>>>http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>Jenny Drnevich, Ph.D.
>>
>>Functional Genomics Bioinformatics Specialist
>>W.M. Keck Center for Comparative and Functional Genomics
>>Roy J. Carver Biotechnology Center
>>University of Illinois, Urbana-Champaign
>>
>>330 ERML
>>1201 W. Gregory Dr.
>>Urbana, IL 61801
>>USA
>>
>>ph: 217-244-7355
>>fax: 217-265-5066
>>e-mail: drnevich at uiuc.edu
>

Jenny Drnevich, Ph.D.

Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign

330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA

ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at uiuc.edu