[BioC] Defining Weights in marrayNorm.

Gordon Smyth smyth at wehi.edu.au
Wed Aug 6 01:36:07 MEST 2003


At 09:40 PM 5/08/2003, Josef Walker wrote:
>Dear Gordon,
>
>The flags we (Michael Watson and I) use are self-defined and attached as
>an extra column, tagged on to the end of the rest of the data in each
>individual .gpr/.txt file.
>
>It would probably also be prudent to explain more clearly our
>definitions of "Good" and "Bad". Spots could be considered as good if
>they fit into a number of different categories, based on the fact the
>data is derived from two separate images, representing the two different
>channels.
>Our definition of a GOOD spot, is one that passes all of our QC criteria
>and has a signal intensity above the thresholds we set for defining
>whether or not a spot is considered "expressed".
>
>A single spot could be considered good if it passed QC elements for both
>channels.
>A single spot could be good in one channel and bad in the other channel,
>or BAD in both, ending up with an overall assignment of BAD.
>However, if the signal in one channel is below the thresholds that
>define whether or not a spot is considered to be expressed or not, then
>the definitions would change i.e. good signal in one channel, below
>threshold signal in the other channel (which might be considered BAD
>according to some of the QC criteria); overall this spot would be
>considered GOOD.
>
>The decision to use only those genes considered GOOD i.e. expressed in
>both channels, is based on the fact that only these genes provide good
>reliable signal from both of the channels, and that ratios derived from
>these spots would also be reliable. Ratios derived from BAD spots are
>definitely unreliable. Ratios derived from those spots with only
>background signal in both channels (unexpressed genes), or good in one
>channel and unexpressed in the other channel, are unreliable as they do
>not contain fluorescence intensity data derived from labelled cDNA in
>both channels and so could not account for any gross differences in
>signal intensity arising from this source.

I don't want to get into an argument on this topic, but there is absolutely 
no reason to filter out low intensity spots before using loess 
normalisation. Loess normalisation is intensity-based, it is designed to 
accept the whole range of intensities.

>So just to re-cap, am I correct in thinking that if we define the
>slot/column (w), in which our self-constructed "Flags" are contained, in
>our marrayRaw objects (read into R using read.marrayRaw or read.GenePix
>and taken from the .gpr or .txt files derived from the raw images) and
>then use the maNormMain function  in the marrayNorm Library (not
>maNorm), setting maW = TRUE, then these weights WILL be used for
>calculating the normalised vaules.

Where did you get the piece of code "maW = TRUE" from? As far as I know, 
there is no such command in bioconductor. Also 'w' isn't a slot for an 
marrayRaw object. You need to read the documentation carefully ...

In principle, if you put data into the 'maW' slot of a marrayRaw object, 
then the values should be used as weights by maNormMain, and this should 
happen automatically without you having to tell it to do so. However I 
haven't tried it out to check that it works and I'm not an author of that 
software. I will leave others to help you with maNormMain ...

Gordon

>Thankyou for your help so far, this mailing list is a real life-saver.
>
>Unfortunately I am a away for the next 7 days so won't be able to access
>my messages, but will be looking forward to checking them when I get
>back.
>
>Best Wishes
>
>
>Joe
>
>
>Josef Walker BSc (Hons)
>PhD Student
>Memory Group
>The Edward Jenner Institute for Vaccine Research
>Compton
>Nr Newbury
>Berkshire
>RG20 7NN
>
>Tel: 01635 577905
>Fax: 01635 577901
>E-mail: Josef.walker at jenner.ac.uk
>
>
>-----Original Message-----
>From: Gordon Smyth [mailto:smyth at wehi.edu.au]
>Sent: 05 August 2003 10:43
>To: michael watson (IAH-C)
>Cc: James MacDonald; bioconductor at stat.math.ethz.ch; Josef Walker
>Subject: RE: [BioC] Defining Weights in marrayNorm.
>
>Dear Michael,
>
>I think you are not understanding exactly how the weights work. What you
>
>want to do really is accomplished using weights and cannot be
>accomplished
>by any subsetting operation. Subsetting operations have to, by their
>very
>nature, apply the same to every array, and this isn't what you want.
>
>1. Let me say first of all that we generally do not recommend
>restricting
>normalisation only to "good" spots. The normalisation routines are
>written
>so that they are robust, i.e., they are able to ignore groups of poor
>quality or differentially expressed genes if they don't follow the trend
>of
>the rest of the data. This means that a minority of poor quality spots
>is
>unlikely to do much harm. Very often there is some information even in
>the
>poorer quality spots and it is best to leave them in. This also saves
>lots
>of time. There are exceptions of course ...
>
>2. How are you choosing the "good quality" spots? Programs like genepix
>flag spots which they think are of questionable quality. If you are
>using
>flags provided by the image analysis program, then you can read in the
>weights as you read in the data. For example, if you have genepix data
>then
>
>RG <- read.maimages(files, source="genepix", wt.fun=wtflags(0))
>
>will give zero weight to any spot flagged by genepix as being
>questionable.
>When you normalise the data using
>
>MA <- normalizeWithinArrays(RG)
>
>the normalisation regressions will use only those spots which have
>weights
>greater than zero. This will vary between arrays and is exactly what you
>
>want to achieve. All the spots will be normalized, whether "good" or
>"bad"
>quality, but only the "good" spots will have any influence on the
>normalisation functions. The normalisation of the "good" spots will be
>exactly as if the "bad" spots where not there.
>
>3. If you have constructed the spot flags yourself, then you'll have to
>proceed something like this. Suppose you have two arrays in two genepix
>output files. Suppose the flags for the first array are stored in a
>vector
>called 'flag1' with 1 for good spots and 0 for bad. Suppose the flags
>for
>the second array are stored in a vector 'flag2'. You will read in the
>intensity data using
>
>RG <- read.maimages(files, source="genepix")
>
>Then you'll have to assemble the flags into a matrix with rows for genes
>
>and columns for arrays using 'cbind(flag1, flag2)'. Then you put this
>into
>the weight component:
>
>RG$weights <- cbind( flag1, flag2 )
>
>Now you can use
>
>MA <- normalizeWithinArrays(RG)
>
>and normalisation will use, for each array, only those spots for which
>the
>flags are equal to 1.
>
>4. If you have somehow constructed the flags externally to R, you will
>need
>to read them into R. Suppose you have the flags in a tab-delimited text
>file with one row for each gene and columns corresponding to arrays.
>Then
>you read them in:
>
>w <- as.matrix(read.table("myfile"))
>RG$weights <- w
>
>and then proceed as before.
>
>Hope this helps
>Gordon
>
>At 06:28 PM 5/08/2003, michael watson (IAH-C) wrote:
> >Hi
> >
> >I think the problem that both Jo and myself are having is that we want
>to
> >know how to subset data, either in limma or the marray* classes, such
>that
> >we only use good quality spots in the normalisation process.
> >
> >The problem is, the spots that are "good quality" differ from array to
> >array, so it's not something we can set in the layout object unless we
> >create a different layout object for each array.  So we started looking
>at
> >the concept of using "weights", but really, the problem of not being
>able
> >to subset our data successfully still remains.
> >
> >So as a more generalised question, how can I use Bioconductor to
>normalise
> >microarray data based only on a subset of good quality spots, the
>location
> >of which will differ from array to array?
> >
> >Thanks
> >M
> >
> >-----Original Message-----
> >From: Gordon Smyth [mailto:smyth at wehi.edu.au]
> >Sent: 05 August 2003 01:26
> >To: James MacDonald
> >Cc: bioconductor at stat.math.ethz.ch; josef.walker at jenner.ac.uk
> >Subject: Re: [BioC] Defining Weights in marrayNorm.
> >
> >
> >Dear James and Jim,
> >
> >Actually the maNorm function doesn't make use of weights, even though
> >weights might be set in the marrayRaw object. If you look at the code
>for
> >maNorm you will see that the weights are set to NULL when the call is
>main
> >to maNormMain.
> >
> >If you want to use weights for normalization you need either to use the
> >lower level function maNormMain (which appears to use weights) or use
>the
> >normalization routines in the limma package instead.
> >
> >In limma you use read.maimages to read the data into, perhaps picking
>up
> >the quality weights from genepix or quantarray in the process. If you
>have
> >made your own weights, you can simply assign them to the weights
>component,
> >e.g.,
> >
> >RG <- read.maimages(files, source=your image analysis program)
> >RG$weights <- your.weights
> >RG$printer <- info about array layout, e.g.,
> >list=(ngrid.c=4,ngrid.r=4,nspot.r=20,nspot.c=20)
> >MA <- normalizeWithinArrays(RG)
> >
> >Gordon
> >
> >At 03:26 AM 5/08/2003, James MacDonald wrote:
> > > >From perusing the functions (particularly maNorm), it appears that
>the
> > >weights are used by all normalization procedures except for "median".
>By
> > >definition, a weight is in the range [0,1], so if you use 0 and 1, it
> > >will effectively be the same as saying "don't use this" or "use
>this".
> > >You can also use some more moderate values rather than completely
> > >eliminating the 'bad' spots (e.g., simply down-weight spots that look
> > >sketchy).
> > >
> > >
> > >I think you pass the weights using the additional argument w="maW" in
> > >your call to maNorm.
> > >
> > >HTH,
> > >
> > >Jim
> > >
> > >James W. MacDonald
> > >Affymetrix and cDNA Microarray Core
> > >University of Michigan Cancer Center
> > >1500 E. Medical Center Drive
> > >7410 CCGC
> > >Ann Arbor MI 48109
> > >734-647-5623
> > >
> > > >>> "Josef Walker" <josef.walker at jenner.ac.uk> 08/04/03 12:31PM >>>
> > >Hi all,
> > >
> > >
> > >
> > >My name is Joe Walker and I am a final year PhD student attempting to
> > >use Bioconductor to analyse a large amount of cDNA microarray data
> > >from
> > >my thesis experiments.
> > >
> > >
> > >
> > >For the normalisation stage, there is the option to use weights
> > >previously assigned to the genes.
> > >
> > >I wish to normalise my genes based on a quality controlled subset
>that
> > >changes fro each hybridisation, I think one way to do this is to use
> > >the
> > >weights option during normalistion.
> > >
> > >The "slot" for the weights (maW) is assigned/loaded during the
> > >marrayInput stage using the read.marrayRaw command (along with
>name.Gf
> > >etc).
> > >
> > >What I am unclear of is:
> > >
> > >1)       What form do these weights take i.e does 1 = use this gene
> > >and
> > >0 = do not use this gene, are they graded, or do they have to be
> > >defined
> > >elsewhere?
> > >
> > >2)       Do you use these weights by simply using maW = TRUE, during
> > >the
> > >normalisation stage?
> > >
> > >Am I at least on the right track?
> > >
> > >If anyone has advice for me it would be great.
> > >
> > >Thanks in advance,
> > >
> > >Joe
> > >
> > >Josef Walker BSc (Hons)
> > >
> > >PhD Student
> > >
> > >Memory Group
> > >
> > >The Edward Jenner Institute for Vaccine Research
> > >
> > >Compton
> > >
> > >Nr Newbury
> > >
> > >Berkshire
> > >
> > >RG20 7NN
> > >
> > >
> > >
> > >Tel: 01635 577905
> > >
> > >Fax: 01635 577901
> > >
> > >E-mail: Josef.walker at jenner.ac.uk



More information about the Bioconductor mailing list