[BioC] probeset level visualization with standard deviation...

Martin Morgan mtmorgan at fhcrc.org
Thu Feb 21 04:00:02 CET 2008

Hi Paul -- Someone will probably contribute a one line solution to
your problem, and be able to provide examples more directly related to
the packages you are using. However...

If you have a matrix m and an index idx, as you might with

> library(Biobase)
> data(sample.ExpressionSet)
> obj <- sample.ExpresssionSet # save typing later
> m <- exprs(obj)
> m <- m[apply(m, 1, function(row) all(row>0)),]
> m <- log(m) # log expression values, no NA's for simplicity below
> idx <- sample(1:10, nrow(m)) # fake 'probe' values

then to calculate the mean for each group represented by idx it is
actually quite easy:

> probeset <- rep(idx, ncol(m))
> meanByProbe <- tapply(m, probeset, mean)

and similarly for variances or whatever other measure you'd like. You
can think of this as 'flattening' m into a long vector, one column
after another, then splitting the vector into groups based on a
similarly-lengthed vector created by replicating 'idx', and finally
calculating the mean for each group.

For the more complicated summary I think you're after, you might
(using median rather than mean, since these will be more robust
measures of central tendency for non-normal data)

> probeset <- rep(idx, ncol(m))
> trt <- rep(obj\$type, each=nrow(m))
> meds <- tapply(m, list(probeset, trt), median)
> meds
Case  Control
1  4.857189 4.894641
2  4.584452 4.671403
3  4.491588 4.556983
4  4.835686 4.769879
5  4.305809 4.432940
6  5.006192 5.022188
7  4.727795 4.652226
8  4.994262 5.153690
9  4.878079 5.000073
10 4.717114 4.786241

A measure of dispersion, like the median average deviation, can be
calculated in a similar way:

A simple

> plot(meds)

gives a scatter plot of this data.

I'm a fan of lattice. The equivalent of the above plot command might
be

> library(lattice)
> with(df, xyplot(Case~Control))

If you were interested in just one group, I'd suggest a
box-and-whiskers plot rather than summarizing the data as mean +
standard error, something along the lines of

> df <- data.frame(exprs=as.vector(m), probeset=probeset)
> with(df, bwplot(probeset~exprs))

Confidence measures in each direction are definitely more complicated,
and one might quickly become satisfied with something like

> df <- data.frame(exprs=as.vector(m), probeset=probeset, treatment=trt)
> with(df, bwplot(probeset~exprs|treatment))

which draws two different bwplot 'panels', one for each treatment, or

> with(df, bwplot(probeset~exprs|treatment,
+                 panel=function(...) {
+                     panel.bwplot(...)
+                     panel.abline(v=median(exprs))
+                 }))

which draws the bwplot and then adds a vertical line corresponding to
the median over all treatments (providing a useful visual reference
for comparison). However, there's a demo that you can follow along
with at

> file.show(system.file("demo/intervals.R", package = "lattice"))

The idea is that lattice displays plots in a 'panel'. This function
from the demo

prepanel.ci <- function(x, y, lx, ux, subscripts, ...)
{
x <- as.numeric(x)
lx <- as.numeric(lx[subscripts])
ux <- as.numeric(ux[subscripts])
list(xlim = range(x, ux, lx, finite = TRUE))
}

sets up the panel so that the x-axis limits are appropriate ('lx',
'ux' are vectors of lower and upper limits for each x point; you'd
have to add a your own variables, e.g. ly, uy, to the function
signature, and a ylim element to the list for error bars on both
points). This function from the demo

panel.ci <- function(x, y, lx, ux, subscripts, pch = 16, ...)
{
x <- as.numeric(x)
y <- as.numeric(y)
lx <- as.numeric(lx[subscripts])
ux <- as.numeric(ux[subscripts])
panel.abline(h = unique(y), col = "grey")
panel.arrows(lx, y, ux, y, col = 'black',
length = 0.25, unit = "native",
angle = 90, code = 3)
panel.xyplot(x, y, pch = pch, ...)
}

does the actual display of the points (panel.xyplot) and of the
confidence interval (panel.arrows, see ?arrows for some hints). You'd
add another panel.arrows to display the second set of confidence
intervals. Finally, from the demo,

with(singer.ucl,
xyplot(voice.part ~ median,
lx = lower, ux = upper,
prepanel = prepanel.ci,
panel = panel.ci))

invokes xyplot with the appropriate prepanel and panel functions, and
with a data frame (singer.ucl) that, by this point in the example, has
columns lower and upper describing where the confidence bounds are.

Too much info, I know, but hopefully useful.

Martin

Paul Hammer <hammer_p at molgen.mpg.de> writes:

> hi members,
>
> i am looking for a function which plots me average probeset levels with
> standard deviations (e.g. tissues 1 against tissue 2). Is there any
> functions which could do this. i work with affymetrix chips which have
> often three times determination.
>
> an example for a better understanding:
>
> after using the functions read.exon() and rma() i specify my probeset_ids:
>
>  > probeset = gene.to.probeset("ensemle_gene_id")
>  > exon_probesets = select.probewise(probeset, filter="exonic")
>
> now i would like to have the average values with the standard deviation
> for these probeset_id log levels to plot e.g. two tissues (here heart
> and breast) against later.
>
> probeset_id    heart_A.CEL       heart_B.CEL    heart_C.CEL
> breast_A.CEL    breast_B.CEL    breast_C.CEL
> 1.                   6.5432                6.7833              6.3234
>             4.6566               5.2121               4.7654
> 2.                   5.8422                6.0813              6.2334
>             5.2526               5.1236               4.9234
> 3.                   5.8422                6.0813              6.2334
>             5.2526               5.1236               4.9234
>
> is there any simple way? till now i extract all data from a data.frame
> given by splicing.index(), calculate average and sd for every
> probeset_id and tissue, which is quite complicated.
>
> thanks
> paul
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793