[BioC] Coercing Normalized Data to exprSet

Barry Henderson barry.henderson@ribonomics.com
Tue, 4 Feb 2003 13:36:25 -0500


Wolfgang

Thanks!  A big help.  I understand the normalization process and had
been looking at Sandrine's paper but I had not come up with the specific
derivations.  

For completeness and to help in understanding the inner workings of
BioConductor, I am also interested in direct normalization of the entire
R and G matrices?  Can you provide a little more insight into that
process? I've tried a couple of direct attempts using maNormMain but it
is not obvious from your initial response and it is not well covered in
the docs.

Thanks again

Barry



-----Original Message-----
From: Wolfgang Huber [mailto:w.huber@dkfz-heidelberg.de] 
Sent: Tuesday, February 04, 2003 12:44 PM
To: Barry Henderson
Cc: Sandrine Dudoit
Subject: RE: [BioC] Coercing Normalized Data to exprSet


Hi Barry

Sandrine: I cc you to make sure what I say is OK!

It is described in the paper "Statistical methods for identifying
differentially expressed genes in replicated cDNA microarray
experiments" by Sandrine Dudoit, Yee Hwa Yang, Matthew J. Callow, and
Terence P. Speed.

If R is the matrix of the log2 of the background-corrected unnormalized
red intensities, and G the corresponding for green intensites
(rows=spots, columns=chips), then the unnormalized M and A are:

	M =      R-G
      A = 1/2 (R+G)

Normalization in the approach of marrayNorm involves some kind of
manipulation of the matrix M, e.g. subtracting something from each
column. If you need to, you can solve the above equation for the R and
G:

     R = (M+2A)/2
     G = (2A-M)/2

and obtain the "normalized" R and G. As I said, other normalization
methods work on the R and G matrices directly, it depends on what you
want to do.

Best regards
Wolfgang

Division of Molecular Genome Analysis (Poustka)
German Cancer Research Center (DKFZ)
Im Neuenheimer Feld 580
69120 Heidelberg, Germany

w.huber@dkfz.de
http://www.dkfz.de/abt0840/whuber
Tel +49-6221-424709
Fax +49-6221-42524709


> -----Original Message-----
> From: Barry Henderson [mailto:barry.henderson@ribonomics.com]
> Sent: Tuesday, February 04, 2003 6:30 PM
> To: Wolfgang Huber
> Subject: RE: [BioC] Coercing Normalized Data to exprSet
>
>
> Wolfgang
>
> Thanks for the response.  Not meaning to ask an obvious question but 
> how does x = new('exprSet', exprs = cbind(M-A, M+A)) get me to 
> normalized, logged R and G values?  Sorry, I'm not a trained 
> statistician, just trying to pick it up on the fly.
>
> Barry
>
> -----Original Message-----
> From: Wolfgang Huber [mailto:w.huber@dkfz-heidelberg.de]
> Sent: Tuesday, February 04, 2003 12:02 PM
> To: Barry Henderson
> Subject: RE: [BioC] Coercing Normalized Data to exprSet
>
>
> Hi Barry,
>
> The (M,A) representation of the data is very good for pairwise 
> comparison, but in some cases other representations may be more 
> useful. You could go back to the (normalized, logged) R and G values 
> and construct an exprSet with twice as many columns as chips, and each

> pair of columns containing the R and G values. Schematically, this is 
> done by
>
> 	x = new('exprSet', exprs = cbind(M-A, M+A))
>
> (modulo signs and factors). There are also normalization methods that 
> normalize the whole matrix cbind(Rf-Rb, Gf-Gb) [as in 
> marrayRaw]simultaneously, rather than chip-by-chip. I've no clear idea

> about the pros and cons, but just to mention it.
>
> Best regards
> Wolfgang
>
> Division of Molecular Genome Analysis
> German Cancer Research Center (DKFZ)
> Im Neuenheimer Feld 580
> 69120 Heidelberg, Germany
>
> w.huber@dkfz.de
> http://www.dkfz.de/abt0840/whuber
> Tel +49-6221-424709
> Fax +49-6221-42524709
>
>
> > -----Original Message-----
> > From: bioconductor-admin@stat.math.ethz.ch
> > [mailto:bioconductor-admin@stat.math.ethz.ch]On Behalf Of Barry 
> > Henderson
> > Sent: Tuesday, February 04, 2003 3:47 PM
> > To: bioconductor@stat.math.ethz.ch
> > Subject: [BioC] Coercing Normalized Data to exprSet
> >
> >
> > Dear List
> >
> > I have a set of two color array data I am trying to analyze with 
> > BioConductor.  The experiment is of a loop design.  I have read the 
> > data in and conormalized it leaving me with a marrayNorm object.  I 
> > have coerced that object into an exprSet object and I am now trying 
> > to
>
> > understand how to filter/test on that object.  Since the individual 
> > channels of the two color array data have been collapsed into a 
> > single
>
> > log ratio in the exprSet object, assigning covariates to individual 
> > channels (samples) seems unobvious.
> >
> > Am I missing something?  Is this a challenge in dealing with loop 
> > designs?  Or is this a limitation of BioConductor with respect to 
> > this
>
> > experimental design?
> >
> > Can I calculate the normalized expression, write the values out and 
> > then read them back in as an exprSet?  If so, is there a facile way 
> > to
>
> > handle this process.  I've been through the docs, vignettes, and 
> > worked with the eset provided with BioBase but it simply isn't 
> > obvious
>
> > to me how to work with two color, loop designs.
> >
> > Thanks in advance for any advice.  I have pasted excerpts of the 
> > marrayNorm (normalized.data) and exprSet (tox2) objects I am working

> > with below.  As you can see 45 arrays get collapsed into 45 
> > samples...
> >
> > As an added note, I have calculated normalized intensity values and 
> > written them out for input into maanova but I would like to 
> > undertand how to do this in BioConductor if possible.
> >
> > Barry Henderson
> >
> >
> > > normalized.data
> > Normalized intensity data:       Object of class marrayNorm.
> >
> > Number of arrays:       45 arrays.
> >
> > A) Layout of spots on the array:
> > Array layout:    Object of class marrayLayout.
> >
> > Total number of spots:                  2688
> > Dimensions of grid matrix:              4 rows by 4 cols
> > Dimensions of spot matrices:            12 rows by 14 cols
> >
> > Currently working with a subset of 2688 spots.
> >
> > Control spots:
> > There are   2 types of controls :
> > Control  normal
> >     208    2480
> >
> >
> > Notes on layout:
> > C:/Tox2/genes.txt
> >
> > B) Samples hybridized to the array:
> > Object of class marrayInfo.
> >
> >     maLabels # of slide            Names Experiment Cy3 Experiment
Cy5
> > 1   34-108-1   34-108-1  34-108-1.Rinput          Wyeth
Bezafibrate
> > 2   34-108-2   34-108-2  34-108-2.Rinput     Lovastatin
Wyeth
> > 3   ...
> >
> > =====================
> >
> > > tox2
> > Expression Set (exprSet) with
> >         2688 genes
> >         45 samples
> >                  phenoData object with 6 variables and 45 cases
> >          varLabels
> >                 : # of slide
> >                 : Names
> >                 : Experiment Cy3
> >                 : Experiment Cy5
> >                 : date
> >                 : Comments
> >
> >
> > 	[[alternate HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor@stat.math.ethz.ch 
> > http://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
> >
>
>