[BioC] arrayQualityMetrics stringency & co
audrey at ebi.ac.uk
Thu Oct 16 12:08:16 CEST 2008
- Here is how the outlier detection is performed:
For the MA-plot, the mean of the absolute value of M is computed for
each array and those that lie beyond the extremes of the boxplot's
whiskers are considered as possible outliers arrays. The same approach,
i.e. using the whiskers of the boxplot, is applied to the following: the
mean and interquartile range (IQR) from the boxplots and NUSE, the sums
of the rows of the distance matrix (for the heatmap), and the amplitude
of low frequencies of the periodogram (for the spatial intensity
distribution). In the case of the RLE plot, any array with a median RLE
higher than 0.1 is considered as a possible outlier.
To decide whether or not you should remove some chips from your
analysis, I advice you to run the report after normalisation. If after
normalisation, some arrays are flagged with a star in several quality
assessment sections, I would remove it. Of course, it mainly depends on
the context. For instance, if there is a biological good reason for an
array to be an outlier, keep it.
- To see the "inside" of the arrayQualityMetrics function:
gives you the classes for which a method exists. Then you can see the
function for one of this class using selectMethod, for instance:
However, if you are willing to modify it, you can download the source of
the package, the functions are in the directory "arrayQualityMetrics/R".
I am currently working on a new version of the package where it will be
easier to adapt the functions and to modify the report. If you are
interested, you can have a look at the devel branch of Bioconductor, I
will update the development version of the package soon.
- For the missing heatmap, I have sometimes seen that the plot is done
but for some reason does not show in the report. You can check the files
heatmap.png and heamapt.pdf in the directory where you created the report.
Yannick Wurm wrote:
> Morning List,
> I've been toying with arrayQualityMetrics which gives me a great
> overview of my data without too much work.
> Several things are still unclear to me though:
> - how does it calculate the '*' that indicate that a chip may be
> bad. How stringent/conservative are they? Because several of my
> spotted cDNA chips show up as having issues, and now I'm unsure
> whether or not I should remove them from my analysis.
> - is there any way to "see inside" the package? I'd like to see
> how the stringency is calculated, and adapt some of the output
> formatting. But when I try to look inside, all I get is something
> called "environment":
> > arrayQualityMetrics
> standardGeneric for "arrayQualityMetrics" defined from package
> function (expressionset, outdir = getwd(), force = FALSE,
> do.logtransform = FALSE,
> split.plots = FALSE, intgroup = "Covariate")
> <environment: 0x4882ad8>
> Methods may be defined for arguments: expressionset, outdir,
> force, do.logtransform, split.plots, intgroup
> Use showMethods("arrayQualityMetrics") for currently
> available ones.
> - on one NChannelSet containing 280 slides, arrayQualityMetrics
> didn't calculate the heatmap. But didn't display any error messages
> either. Possibly because of memory contraints on my 3GB mac?
> Thanks for putting me on the track to resolving this.
> yannick . wurm @ unil . ch
> Ant Genomics, Ecology & Evolution @ Lausanne
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> Search the archives:
EMBL - EBI
More information about the Bioconductor