[BioC] XPS: new quality control features (e.g. residual images)

cstrato cstrato at aon.at
Fri Apr 15 17:26:58 CEST 2011


Dear all,

Let me first thank Dan Tenenbaum and Herve Page for their continuous 
support of the special requirements of xps. It is their credit that the 
daily multiple platform build/check works so seamlessly.

Now I am pleased to announce that the new release of "xps" does finally 
support many quality control features, asked for by many users for quite 
some time: These quality control features are based on the new S4 class 
"QualTreeSet", which adds quality control features for the raw 
intensities, background-corrected and normalized intensities, as well as 
to the summarization.


Concretely, the following QC methods are new:

- borderplot(): boxplot of positive and negative border elements of arrays.

- coiplot(): "Center of Intensity" plots of positive and negative border 
elements.

- corplot(): a heat map of the array-array Spearman rank correlation 
coefficients.

- madplot(): an array-array expression level distance plot (heat map).

- nuseplot(): NUSE boxplots for raw data, background adjusted and 
normalized intensities.

- pcaplot(): a PCA plot of the first two principle components of 
expression levels or their correlations.

- rleplot(): RLE boxplots for raw, background adjusted and normalized 
expression levels.

- image(): a heat map of residual images (both residuals and weights).

- YES, you can draw residual images for Exon ST arrays, Gene ST arrays 
and plate arrays!

- plotAffyRNAdeg(): Affymetrix RNA degradation plots.

- YES, you can draw RNA degradation plots for Exon ST arrays, Gene ST 
arrays and plate arrays!

- Both, residual images and RNA degradation plots can be created for the 
raw intensities, background corrected intensities and normalized 
intensities (see parameter 'qualopt').


Furthermore, improvements in the handling of data, especially of large 
datasets, have been made in the C++ code:

- treeInfo(): many tree data such as min/max intensities, quantiles, 
percent P/M/A calls are
               now preprocessed during creation of the root trees and 
are stored in the trees
               as tree info ('userinfo'); this information allows to 
draw boxplots, callplots, etc much faster,
               and is accessible by calling method treeInfo().


The following QC methods have been improved:

- boxplot(): it is now possible to draw boxplots of raw data w/o the 
need to attachInten() first.

- callplot(): barplots of percent P/M/A calls now access the preprocessd 
tree info.

- image(): it is now possible to draw images of intensities w/o the need 
to attachInten() first.


The following functions are improved replacements of similar existing 
functions:

- plotBoxplot(): allows to easily save boxplots as png, jpeg, etc files 
(will replace function boxplot.dev()).

- plotImage(): allows to easily save one or more images as png, jpeg, 
etc files (will replace function image.dev()).


The following QC methods have not been changed:

- hist(): density plots of raw data and expression levels still depend 
on a non-empty slot 'data', thus for large datasets it is still 
necessary to use function root.density().

- mvaplot(): scatter plot of M vs A values.

- pmplot(): barplot of PM and MM intensities still depends on a 
non-empty slot 'data'.


Please note that S4 class "QualTreeSet" is still work in progress, there 
are still missing features, missing documentation, and the current 
features are not tested extensively. For this reason there may be still 
some hidden problems/bugs.

If you experience any problems, please report them as follows:
- run R from the console, i.e. RTerm, xterm
- set "verbose=TRUE" in all functions
- send me the complete output of the console
- add your sessionInfo()
- add the ROOT version that you use (should be root_v5.27.04)


As an example you can find below some demo code for the HuGene ST array.

Best regards
Christian
_._._._._._._._._._._._._._._._._._
C.h.r.i.s.t.i.a.n   S.t.r.a.t.o.w.a
V.i.e.n.n.a           A.u.s.t.r.i.a
e.m.a.i.l:        cstrato at aon.at
_._._._._._._._._._._._._._._._._._


#------------------------------------------------------------------------------#

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Import raw data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

### new R session: load library xps_1.11.12
library(xps)

### define directories:
scmdir <- "/Volumes/GigaDrive/CRAN/Workspaces/Schemes/na31"
celdir <- "/Volumes/GigaDrive/ChipData/Exon/HuGene"
datdir <- getwd()

### import raw data (e.g. Affymetrix human tissue dataset for HuGene ST 
arrays)
scheme.gene <- root.scheme(file.path(scmdir, "hugene10stv1.root"))
celfiles <- 
c("TisMap_Breast_01_v1_WTGene1.CEL","TisMap_Breast_02_v1_WTGene1.CEL",
 
"TisMap_Breast_03_v1_WTGene1.CEL","TisMap_Prostate_01_v1_WTGene1.CEL",
 
"TisMap_Prostate_02_v1_WTGene1.CEL","TisMap_Prostate_03_v1_WTGene1.CEL")
celnames <- 
c("Breast01","Breast02","Breast03","Prostate01","Prostate02","Prostate03")
data.gene <- import.data(scheme.gene, "TisMapData", filedir=datdir, 
celdir=celdir, celfiles=celfiles, celnames=celnames)
str(data.gene)

### show data stored in "userinfo" of ROOT trees, see "?treeInfo"
info <- treeInfo(data.gene, treetype="cel", varlist="*")

### boxplot of raw data (taking advantage of preprocessed quantiles in 
"userinfo")
boxplot(data.gene, which="userinfo:fIntenQuant")

### draw whiskers at 10%, 90% and show min/max values as outliers
boxplot(data.gene, which="userinfo:fIntenQuant", range=1.5)

### images of raw data (no longer necessary to attachInten())
names <- unlist(treeNames(data.gene))
image(data.gene, names=names[4], add.legend=TRUE)

### save image as png-file
outfile <- paste("Image_", names[4], sep="")
plotImage(data.gene, type="intensity", names=names[4], dev="png", 
outfile=outfile)

### density plot
outfile <- paste("DensityPlot_", names[4], sep="")
root.density(data.gene, canvasname=outfile, save.as="png")

### the following plots still require attachInten()
data.gene <- attachInten(data.gene)
# density plot
hist(data.gene, which="core")
# PM/MM plot
pmplot(data.gene)
data.gene <- removeInten(data.gene)


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Quality control
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

### new R session
library(xps)

### qualification - RLM
rlm.all <- fitRLM(data.gene, "TisMapRLM_allcore", filedir=datdir, tmpdir="",
                   background="antigenomic", normalize=TRUE, qualopt="all",
                   option="transcript", exonlevel="core+affx", 
add.data=FALSE)

### show data stored in "userinfo" of ROOT trees
info <- treeInfo(rlm.all, treetype="rlm", varlist="*")

### boxplot of positive/negative border elements
borderplot(rlm.all, transfo=NULL)
borderplot(rlm.all, type="pos", transfo=NULL)
borderplot(rlm.all, type="neg", transfo=NULL, ylim=c(0,1000))

### Center-of-Intensity plots for positive/negative border elements
coiplot(rlm.all)
coiplot(rlm.all, type="pos", )
coiplot(rlm.all, type="neg")

### NUSE plot
nuseplot(rlm.all, names="namepart")
# for raw data only (usual case)
nuseplot(rlm.all, names="namepart:raw")

### RLE plot
rleplot(rlm.all, names="namepart")
# for normalized data only (usual case)
rleplot(rlm.all, names="namepart:normalized")

### residual images of raw data
resname <- getTreeNames(rootFile(rlm.all), treetype="res")
image(rlm.all, type="resids", names=resname[4], add.legend=TRUE)
image(rlm.all, type="pos.resids", names=resname[4])
image(rlm.all, type="neg.resids", names=resname[4])
image(rlm.all, type="sign.resids", names=resname[4])
image(rlm.all, type="weights", names=resname[4], add.legend=TRUE)

### save residual image as png-file
outfile <- paste("ResidualImage_", resname[4], sep="")
plotImage(rlm.all, type="weights", names=resname[4], dev="png", 
outfile=outfile)

### residual images of background adjusted data
image(rlm.all, type="weights", qualopt="adjusted", names=resname[10], 
add.legend=TRUE)
### residual images of normalized data
image(rlm.all, type="weights", qualopt="normalized", names=resname[16], 
add.legend=TRUE)

### get RNA degradation
rnadeg <- AffyRNAdeg(rlm.all)
result <- summaryAffyRNAdeg(rnadeg)

### plot RNA degradation
plotAffyRNAdeg(rnadeg, add.legend=TRUE)

### plot slope of RNA degradation
plotAffyRNAdeg(rnadeg, summary=TRUE)


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Preprocessing: RMA
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

### new R session
library(xps)

### RMA
data.rma <- rma(data.gene, "TisMapRMAcore", filedir=datdir, tmpdir="",
                 background="antigenomic", normalize=TRUE, 
option="transcript",
                 exonlevel="core+affx")

### show data stored in "userinfo" of ROOT trees
info <- treeInfo(data.rma, treetype="mdp", varlist="*")

### boxplot of normalized expression levels
boxplot(data.rma, which="userinfo:fLevelQuant")

### density plot
hist(data.rma)

### RLE plot
rleplot(data.rma)
mboxplot(data.rma, ylim=c(-3,3))

### M vs A plots
mvaplot(data.rma, pch=20, ylim=c(-4,4))


### Detection Above Background Call (DABG)
call.dabg <- dabg.call(data.gene,"TisMapDABGcore",filedir=datdir, 
exonlevel="core+affx")
# note: alpha1 and alpha2 need to be adjusted to get usable P/M/A calls

### show data stored in "userinfo" of ROOT trees
info <- treeInfo(call.dabg, treetype="dab", varlist="*")

### detection call plots
callplot(call.dabg)
callplot(call.dabg, beside=FALSE)

#---------------------------------------------------------------------#



More information about the Bioconductor mailing list