[BioC] FlowCore/FlowViz issues

Mon Sep 13 16:51:55 CEST 2010

Hi,

[Apologies if this is delivered twice; my initial mail didn't
appear to be accepted for several hours, and I didn't see any
failure message; does the list object to GPG signatures?]

I've just started to use FlowCore/FlowViz to analyse some of my
flow cytometry data, and ran into a few problems.  I'm hoping
that you might be able to point me in the right direction!

I've been very pleased with it so far, and have got some nice
plots and stats out of it, but I'm sure I'm doing some things
very inefficiently and/or incorrectly!

[I'm using R version 2.11.1 (2010-05-31) on x86_64-pc-linux-gnu
(Debian GNU/Linux) with current Bioconductor packages)]

1) Gating with an ellipsoidGate

cov <- matrix(c(400000000, 0, 0, 0.08), ncol=2,
              dimnames=list(c("FS.Lin", "SS.Log"), c("FS.Lin", "SS.Log")))
mean <- c("FS.Lin"=32000, "SS.Log"=2.8)
cells <- ellipsoidGate(filterId="CellGate", .gate=cov, mean=mean)

I want to select my cells using an ellipsoid gate on forward- and side-
scatter plots.  In the above situation, they lie in a region where
FS=32000±10000 and log₁₀(SS)=2.8±0.5.  However, the values in the
covariance matrix don't match the dimensions; is there any explanation
regarding how to construct a covariance matrix from the actual
dimension I want (I got the above by trial and error until it fitted
nicely--I'm afraid I know little about these matrices).

In some plots I'd also like to rotate the ellipse, but I'm not sure
how to put this into the matrix, if that's the way to do things.
Is this possible?

Is there an alternative constructor to create a gate from real
dimensions?

2) Plotting on a log scale

Currently I'm transforming my "Log" data with a log₁₀ transform:

flowset <- transform(flowset, `SS.Log` = log10(`SS.Log`))
flowset <- transform(flowset, `FL.1.Log` = log10(`FL.1.Log`))

When plotting, I have the log values on the axes.  However, is
it possible to either:

  a) plot on logarithmic axes with the appropriate log binning, or
  b) change the axis labels to be a log scale (even though the
     axes are linear)?

I can do this with regular R plots, but I'm not sure if it's possible
with lattice graphics.

3) Plotting with lattice/cairo_pdf is broken

cairo_pdf("fsss.pdf", width=8, height=8, pointsize=12, antialias="none")
xyplot(`SS.Log` ~ `FS.Lin`, set, filter=cells, xlab="FS", ylab=expression(log[10]~(SS)))
dev.off()

This is a weird one.  If I type or paste the above into an interactive R
shell, the plot is written as a PDF file.  It works just fine.
If I source it with 'source("script.R")', it does not.  The PDF file is
never created/replaced, and from the speed of it and putting print()
around each call, it looks like the plotting is entirely skipped.

I can't reproduce this with regular plots, so maybe it's the lattice
graphics rather than cairo_pdf?

4) Overlaying density plots

By default, density plots are stacked down the y-axis with a user-
specified overlap.  Is is possible to directly overlay one-
dimensional histograms on top of each other to allow direct
comparison?  

5) Updating phenoData

I need to add certain information to my flowSet, so I'm currently
doing this by getting the phenoData, adding the missing bits and
setting it back:

sampleNames(flowset) <- c("unstained", "isotype", "ng2")

pd <- pData(phenoData(flowset))
pd["isotype"] <- c(FALSE, TRUE, FALSE)
pd["description"] <- c("Unstained", "Isotype", "NG2")

pData(phenoData(flowset)) <- pd
varMetadata(phenoData(set))["labelDescription"] <- c("Name", "Isotype", "Description")

This is quite long-winded.  Is there a convenience function present
(or planned?) to add a single parameter which could update the
phenodata and varmetadata at the same time?  It would ensure they
never get out of sync.

6) fsApply for a single column

There's a convenience each_col to apply a function to each column in the
flowSet, but is it possible to just access a single column, or set of
specified columns?  I wrote this little helper:

capply <- function (col, func) {
  function(x) { func(x at exprs[,col]) }
}

which can then be used like so:

fsApply(flowset, capply("FL.1.Log", median))
fsApply(flowset, capply("FL.1.Log", sd))

to get the median and sd for a single channel.  However, it
doesn't work for mean, and I'm not sure why:

fsApply(set2, capply("FL.1.Log", mean))
Error in FUN(if (use.exprs) exprs(y) else y, ...) : 
  could not find function "func"

If I call it directly, it works just fine:

mean(flowset[[1]]@exprs[,"FL.1.Log"])
[1] 0.4648052

It's not clear do me what the difference is here between the
two forms.  Is there a built-in or better way to achieve the
same thing?

7) Getting at filter summary data

> positive <- filter(flowset, pos)
> summary(positive)
filter summary for frame 'isotype'
 positive+: 111 of 9186 events (1.21%)

filter summary for frame 'pos'
 positive+: 12070 of 13246 events (91.12%)

Is it possible to actually get at the raw data here?  i.e.
the raw event number and/or the percentages.  I'm currently
doing it the "hard way", by:

newset <- Subset(flowset, pos)
fsApply(newset, capply("FL.1.Log", length)) / fsApply(flowset, capply("FL.1.Log", length)) * 100

But there must be an easier and more efficient way of extracting this
information!  Looking at the filter object, I can only extract the
strings as printed.

Many thanks for all your help,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.