Flagging spots (was: [BioC] Bioconductor documentation

Gordon Smyth smyth at wehi.edu.au
Wed Sep 1 02:33:26 CEST 2004


At 05:28 AM 1/09/2004, you wrote:
>The reason that we want to read in more columns is to create the 
>flags.  Some people think that spots should be flagged if (e.g.) mean(Rf) 
>differs considerably from median(Rf), or the s.d. of one of these measures 
>is large.  Right now, they need to create the flags outside of Bioconductor.

I think you might want something like

myfun <- function(x) {
okred <- abs(x[,"F635 Median"]-x[,"F635 Mean"]) < 50
okgreen <- abs(x[,"F532 Median"]-x[,"F532 Mean"]) < 50
as.numeric(okgreen & okred)
}
RG <- read.maimages(files, source="genepix", wt.fun=myfun)

Then all the "bad" spots will get weight zero which, in limma, is 
equivalent to flagging them out. You can proceed with

RG$printer <- getLayout(RG$genes)
RG <- backgroundCorrect(RG) # gives more correction options
MA <- normalizeWithinArrays(RG)

to do print-tip loess normalization in which the flagged spots have no 
influence on the normalization.

Gordon

>--Naomi
>
>At 09:28 AM 8/31/2004 +1000, you wrote:
>>At 11:33 PM 30/08/2004, Naomi Altman wrote:
>>>The vignettes are great - perhaps I should not call them 
>>>"tutorials".  But like other documentation of this type (the book "SAS 
>>>for Mixed Models" comes to mind), it is hard to generalize from the 
>>>examples.  We need both the vignettes and the internal 
>>>documentation.  We need good but explicit defaults for the general user, 
>>>and the option to change these defaults for the expert user.
>>>
>>>Here is an example where the documentation is OK, but the option to 
>>>change the defaults is too limited.
>>>
>>>Both limma and marray allow the user read only a limited set of columns 
>>>from gpr and spot files.  Why not have this as the default, and let the 
>>>user decide if they want to read in other columns? Some of my clients 
>>>like to filter spots based on quantities like the difference between the 
>>>median and mean spot intensity, the sd of intensity, etc.  They 
>>>currently need to flag spots before importing into Bioconductor because 
>>>they cannot read these other columns readily into an marrayRaw object.
>>
>>The wt.fun argument to read.maimages() function in limma already provides 
>>the capability to filter or weights spots based on any number of columns 
>>in the original file. So there no need to read in the extra columns or to 
>>flag spots before importing. The computation of the flags is done at the 
>>time of import.
>>
>>The help document for read.maimages() says:
>>      Spot quality weights may be extracted from the image analysis
>>      files using a ready-made or a user-supplied weight function
>>      'wt.fun'. 'wt.fun' may be any user-supplied function which accepts
>>      a data.frame argument and returns a vector of non-negative
>>      weights. The columns of the data.frame are as in the image
>>      analysis output files. See 'QualityWeights' for provided weight
>>      functions.
>>
>>I admit that this is brief, but it does seem explicit.
>>
>>I know that reading in extra columns can be convenient for other 
>>purposes. The reason why I decided not to implement this in limma was 
>>explained in a post to this list on 22 July:
>>https://www.stat.math.ethz.ch/pipermail/bioconductor/2004-July/005434.html
>>
>>Gordon
>>
>>>--Naomi
>>>
>>>Naomi S. Altman                                814-865-3791 (voice)
>>>Associate Professor
>>>Bioinformatics Consulting Center
>>>Dept. of Statistics                              814-863-7114 (fax)
>>>Penn State University                         814-865-1348 (Statistics)
>>>University Park, PA 16802-2111
>>
>>_______________________________________________
>>Bioconductor mailing list
>>Bioconductor at stat.math.ethz.ch
>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>
>Naomi S. Altman                                814-865-3791 (voice)
>Associate Professor
>Bioinformatics Consulting Center
>Dept. of Statistics                              814-863-7114 (fax)
>Penn State University                         814-865-1348 (Statistics)
>University Park, PA 16802-2111



More information about the Bioconductor mailing list