[BioC] FlowCore/FlowViz issues

Wed Sep 15 00:18:31 CEST 2010

Hi Roger,

Answers to your questions are posted below.

Nishant

1) Gating with an ellipsoidGate

Is there any explanation regarding how to construct a covariance matrix
from the actual
dimension I want ?. Is there an alternative constructor to create a gate
from real
dimensions?

Ans:
As mentioned by Josef, the decision to represent gates using covariance
matrix
was made because the standards for definition of gates for flow
cytometry (gatingML specification)
decided to make use of covariance matrices as input.

I do agree that an alternative constructor that takes in the half axes,
center
point etc would be more intuitive to use. I will try to add a method for
that.
Until then please make use of the spreadsheet provided by Josef.

2) Plotting on a log scale
When plotting, I have the log values on the axes.  However, is it
possible to either:
a) plot on logarithmic axes with the appropriate log binning, o
b) change the axis labels to be a log scale (even though the
               axes are linear)?
I can do this with regular R plots, but I'm not sure if it's possible
with lattice
graphics.

Ans:
Yes it is possible with lattice graphics as well.
Please have a look at figure 8.4 and 8.5 and the accompanying code for
generating the plots from Deepayans book at
http://lmdvr.r-forge.r-project.org/
That should probably get close to what you are looking for.

3) Plotting with lattice/cairo_pdf is broken

cairo_pdf("fsss.pdf", width=8, height=8, pointsize=12, antialias="none")
xyplot(`SS.Log` ~ `FS.Lin`, set, filter=cells, xlab="FS",
        ylab=expression(log[10]~(SS)))
dev.off()

This is a weird one.  If I type or paste the above into an interactive R
shell, the plot is written as a PDF file.  It works just fine.
If I source it with 'source("script.R")', it does not.  The PDF file is
never created/replaced, and from the speed of it and putting print()
around each call, it looks like the plotting is entirely skipped.

I can't reproduce this with regular plots, so maybe it's the lattice
graphics rather than cairo_pdf?

Ans:
try the above statements, but include
print(xyplot(`SS.Log` ~ `FS.Lin`, set, filter=cells, xlab="FS",
        ylab=expression(log[10]~(SS))))
instead and see if it works. 
You need to call print on trellis objects to get it to work.

4) Overlaying density plots
By default, density plots are stacked down the y-axis with a user-
specified overlap.  Is is possible to directly overlay one-
dimensional histograms on top of each other to allow direct
comparison? 

Ans:
I do not think that there is an existing function available in flowViz
that let
you stack them up.

If you want to use lattice, you will have to reshape your data a bit to
get it to work.

nms <- sampleNames(dat)
k <- lapply(nms, function(x){
         nr <- nrow(exprs(dat[[x]]))
         data.frame("FSC-H" = exprs(dat[[x]])[,"FSC-H"], samp =rep(x, nr),
             check.names =FALSE)
        })
df <- do.call(rbind, k)
densityplot( ~ `FSC-H`, df, groups = samp)

Here is a simple way of doing that without lattice
data(GvHD)
dat <- GvHD[1:3]
plot(density(exprs(dat[[1]])[,"FSC-H"], n =512), col= col[1])
col <- c("#FF0000", "#00FF00", "#0000FF")
for (i in 1:3)
 polygon(density(exprs(dat[[i]])[,"FSC-H"], n =512), col = col[i] )

5) Updating phenoData
I need to add certain information to my flowSet, so I'm currently
doing this by getting the phenoData, adding the missing bits and
setting it back:

sampleNames(flowset) <- c("unstained", "isotype", "ng2")

pd <- pData(phenoData(flowset))
pd["isotype"] <- c(FALSE, TRUE, FALSE)
pd["description"] <- c("Unstained", "Isotype", "NG2")

pData(phenoData(flowset)) <- pd
varMetadata(phenoData(set))["labelDescription"] <- c("Name", "Isotype",
        "Description")

This is quite long-winded.  Is there a convenience function present
(or planned?) to add a single parameter which could update the
phenodata and varmetadata at the same time?  It would ensure they
never get out of sync.

Ans: I am not quite sure I understand your questions here .
There are update method for pData, phenoData, varLabels, varMetadata etc
that directly work with flowSets. Please see if these methods or their
update methods (<-) are what you are looking for .

data(GvHD)
pData(GvHD)
varLabels(GvHD)
varMetadata(GvHD)

Update methods
varLabels(GvHD) <- c("a", "b", "c", "d", "e")

There's a convenience each_col to apply a function to each column in the
flowSet, but is it possible to just access a single column, or set of
specified columns?  I wrote this little helper:

capply <- function (col, func) {
      function(x) { func(x at exprs[,col]) }
  }

  which can then be used like so:

  fsApply(flowset, capply("FL.1.Log", median))
  fsApply(flowset, capply("FL.1.Log", sd))

   to get the median and sd for a single channel.  However, it
  doesn't work for mean, and I'm not sure why:

  fsApply(set2, capply("FL.1.Log", mean))
  Error in FUN(if (use.exprs) exprs(y) else y, ...) :
    could not find function "func"

    If I call it directly, it works just fine:

    mean(flowset[[1]]@exprs[,"FL.1.Log"])
    [1] 0.4648052

    It's not clear do me what the difference is here between the
    two forms.  Is there a built-in or better way to achieve the
    same thing?

 ANS: If you are looking for median and median statistics for a flowSet
 wont the summary method give you that ??
 data(GvHD)
 summary(GvHD)

 If you want to modify your code to get it to work, try this
 fsApply(GvHD, function(x){
           median(exprs(x)[,"FSC-H"])
         })

7) Getting at filter summary data

> > positive <- filter(flowset, pos)
> > summary(positive)
filter summary for frame 'isotype'
 positive+: 111 of 9186 events (1.21%)

 filter summary for frame 'pos'
  positive+: 12070 of 13246 events (91.12%)

  Is it possible to actually get at the raw data here?  i.e.
  the raw event number and/or the percentages.  I'm currently
  doing it the "hard way", by:

  newset <- Subset(flowset, pos)
  fsApply(newset, capply("FL.1.Log", length)) / fsApply(flowset,
          capply("FL.1.Log", length)) * 100

  But there must be an easier and more efficient way of extracting this
  information!  Looking at the filter object, I can only extract the
  strings as printed.

  Ans:

  Perhaps this example gives you what you are looking for.
  dat <- read.FCS(system.file("extdata","0877408774.B08",
     package="flowCore"))
  rg <- rectangleGate(filterId="myRectGate", list("FSC-H"=c(200, 600),
     "SSC-H"=c(0, 400)))
  res <- filter(dat, rg)

  ##-------------
  tmp <- as(res, "logical")
  inside <- sum(tmp)
  totalCount <- length(tmp)
  pr <- inside*100/totalCount

On 09/13/2010 07:51 AM, Roger Leigh wrote:
> Hi,
>
> [Apologies if this is delivered twice; my initial mail didn't
> appear to be accepted for several hours, and I didn't see any
> failure message; does the list object to GPG signatures?]
>
> I've just started to use FlowCore/FlowViz to analyse some of my
> flow cytometry data, and ran into a few problems.  I'm hoping
> that you might be able to point me in the right direction!
>
> I've been very pleased with it so far, and have got some nice
> plots and stats out of it, but I'm sure I'm doing some things
> very inefficiently and/or incorrectly!
>
> [I'm using R version 2.11.1 (2010-05-31) on x86_64-pc-linux-gnu
> (Debian GNU/Linux) with current Bioconductor packages)]
>
>
> 1) Gating with an ellipsoidGate
>
> cov <- matrix(c(400000000, 0, 0, 0.08), ncol=2,
>               dimnames=list(c("FS.Lin", "SS.Log"), c("FS.Lin", "SS.Log")))
> mean <- c("FS.Lin"=32000, "SS.Log"=2.8)
> cells <- ellipsoidGate(filterId="CellGate", .gate=cov, mean=mean)
>
> I want to select my cells using an ellipsoid gate on forward- and side-
> scatter plots.  In the above situation, they lie in a region where
> FS=32000±10000 and log₁₀(SS)=2.8±0.5.  However, the values in the
> covariance matrix don't match the dimensions; is there any explanation
> regarding how to construct a covariance matrix from the actual
> dimension I want (I got the above by trial and error until it fitted
> nicely--I'm afraid I know little about these matrices).
>
> In some plots I'd also like to rotate the ellipse, but I'm not sure
> how to put this into the matrix, if that's the way to do things.
> Is this possible?
>
> Is there an alternative constructor to create a gate from real
> dimensions?
>
>
> 2) Plotting on a log scale
>
> Currently I'm transforming my "Log" data with a log₁₀ transform:
>
> flowset <- transform(flowset, `SS.Log` = log10(`SS.Log`))
> flowset <- transform(flowset, `FL.1.Log` = log10(`FL.1.Log`))
>
> When plotting, I have the log values on the axes.  However, is
> it possible to either:
>
>   a) plot on logarithmic axes with the appropriate log binning, or
>   b) change the axis labels to be a log scale (even though the
>      axes are linear)?
>
> I can do this with regular R plots, but I'm not sure if it's possible
> with lattice graphics.
>
>
> 3) Plotting with lattice/cairo_pdf is broken
>
> cairo_pdf("fsss.pdf", width=8, height=8, pointsize=12, antialias="none")
> xyplot(`SS.Log` ~ `FS.Lin`, set, filter=cells, xlab="FS", ylab=expression(log[10]~(SS)))
> dev.off()
>
> This is a weird one.  If I type or paste the above into an interactive R
> shell, the plot is written as a PDF file.  It works just fine.
> If I source it with 'source("script.R")', it does not.  The PDF file is
> never created/replaced, and from the speed of it and putting print()
> around each call, it looks like the plotting is entirely skipped.
>
> I can't reproduce this with regular plots, so maybe it's the lattice
> graphics rather than cairo_pdf?
>
>
> 4) Overlaying density plots
>
> By default, density plots are stacked down the y-axis with a user-
> specified overlap.  Is is possible to directly overlay one-
> dimensional histograms on top of each other to allow direct
> comparison?  
>
>
> 5) Updating phenoData
>
> I need to add certain information to my flowSet, so I'm currently
> doing this by getting the phenoData, adding the missing bits and
> setting it back:
>
> sampleNames(flowset) <- c("unstained", "isotype", "ng2")
>
> pd <- pData(phenoData(flowset))
> pd["isotype"] <- c(FALSE, TRUE, FALSE)
> pd["description"] <- c("Unstained", "Isotype", "NG2")
>
> pData(phenoData(flowset)) <- pd
> varMetadata(phenoData(set))["labelDescription"] <- c("Name", "Isotype", "Description")
>
> This is quite long-winded.  Is there a convenience function present
> (or planned?) to add a single parameter which could update the
> phenodata and varmetadata at the same time?  It would ensure they
> never get out of sync.
>
>
> 6) fsApply for a single column
>
> There's a convenience each_col to apply a function to each column in the
> flowSet, but is it possible to just access a single column, or set of
> specified columns?  I wrote this little helper:
>
> capply <- function (col, func) {
>   function(x) { func(x at exprs[,col]) }
> }
>
> which can then be used like so:
>
> fsApply(flowset, capply("FL.1.Log", median))
> fsApply(flowset, capply("FL.1.Log", sd))
>
> to get the median and sd for a single channel.  However, it
> doesn't work for mean, and I'm not sure why:
>
> fsApply(set2, capply("FL.1.Log", mean))
> Error in FUN(if (use.exprs) exprs(y) else y, ...) : 
>   could not find function "func"
>
> If I call it directly, it works just fine:
>
> mean(flowset[[1]]@exprs[,"FL.1.Log"])
> [1] 0.4648052
>
> It's not clear do me what the difference is here between the
> two forms.  Is there a built-in or better way to achieve the
> same thing?
>
>
> 7) Getting at filter summary data
>
>   
>> positive <- filter(flowset, pos)
>> summary(positive)
>>     
> filter summary for frame 'isotype'
>  positive+: 111 of 9186 events (1.21%)
>
> filter summary for frame 'pos'
>  positive+: 12070 of 13246 events (91.12%)
>
> Is it possible to actually get at the raw data here?  i.e.
> the raw event number and/or the percentages.  I'm currently
> doing it the "hard way", by:
>
> newset <- Subset(flowset, pos)
> fsApply(newset, capply("FL.1.Log", length)) / fsApply(flowset, capply("FL.1.Log", length)) * 100
>
> But there must be an easier and more efficient way of extracting this
> information!  Looking at the filter object, I can only extract the
> strings as printed.
>
>
> Many thanks for all your help,
> Roger
>
>