[BioC] arrayQualityMetrics stringency & co

Audrey Kauffmann audrey at ebi.ac.uk
Thu Oct 16 12:08:16 CEST 2008

Hi Yannick,

- Here is how the outlier detection is performed:
For the MA-plot, the mean of the absolute value of M is computed for 
each array and those that lie beyond the extremes of the boxplot's 
whiskers are considered as possible outliers arrays. The same approach, 
i.e. using the whiskers of the boxplot, is applied to the following: the 
mean and interquartile range (IQR) from the boxplots and NUSE, the sums 
of the rows of the distance matrix (for the heatmap), and the amplitude 
of low frequencies of the periodogram (for the spatial intensity 
distribution). In the case of the RLE plot, any array with a median RLE 
higher than 0.1 is considered as a possible outlier.
To decide whether or not you should remove some chips from your 
analysis, I advice you to run the report after normalisation. If after 
normalisation, some arrays are flagged with a star in several quality 
assessment sections, I would remove it. Of course, it mainly depends on 
the context. For instance, if there is a biological good reason for an 
array to be an outlier, keep it.

- To see the "inside" of the arrayQualityMetrics function:
gives you the classes for which a method exists. Then you can see the 
function for one of this class using selectMethod, for instance:


However, if you are willing to modify it, you can download the source of 
the package, the functions are in the directory "arrayQualityMetrics/R".
I am currently working on a new version of the package where it will be 
easier to adapt the functions and to modify the report. If you are 
interested, you can have a look at the devel branch of Bioconductor, I 
will update the development version of the package soon.

- For the missing heatmap, I have sometimes seen that the plot is done 
but for some reason does not show in the report. You can check the files 
heatmap.png and heamapt.pdf in the directory where you created the report.


Yannick Wurm wrote:
> Morning List,
> I've been toying with arrayQualityMetrics which gives me a great 
> overview of my data without too much work.
> Several things are still unclear to me though:
>     - how does it calculate the '*' that indicate that a chip may be 
> bad. How stringent/conservative are they? Because several of my 
> spotted cDNA chips show up as having issues, and now I'm unsure 
> whether or not I should remove them from my analysis.
>     - is there any way to "see inside" the package? I'd like to see 
> how the stringency is calculated, and adapt some of the output 
> formatting. But when I try to look inside, all I get is something 
> called "environment":
>         > arrayQualityMetrics
>         standardGeneric for "arrayQualityMetrics" defined from package 
> "arrayQualityMetrics"
>         function (expressionset, outdir = getwd(), force = FALSE, 
> do.logtransform = FALSE,
>             split.plots = FALSE, intgroup = "Covariate")
>         standardGeneric("arrayQualityMetrics")
>         <environment: 0x4882ad8>
>         Methods may be defined for arguments: expressionset, outdir, 
> force, do.logtransform, split.plots, intgroup
>         Use  showMethods("arrayQualityMetrics")  for currently 
> available ones.
>     - on one NChannelSet containing 280 slides, arrayQualityMetrics 
> didn't calculate the heatmap. But didn't display any error messages 
> either. Possibly because of memory contraints on my 3GB mac?
> Thanks for putting me on the track to resolving this.
> Best,
> Yannick
> --------------------------------------------
>          yannick . wurm @ unil . ch
> Ant Genomics, Ecology & Evolution @ Lausanne
>   http://www.unil.ch/dee/page28685_fr.html
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor

Audrey Kauffmann
Cambridge UK

More information about the Bioconductor mailing list