[BioC] Combat Continuous

Mon May 12 23:48:51 CEST 2014

Hi Michael,

On Mon, May 12, 2014 at 1:23 PM, Michael Breen
<breenbioinformatics at gmail.com> wrote:

> As far as deconvoultion analysis, we currenlty are using Zhong et al. (2013)
> DSA method using HaemAtlas to provide a signature matrix (or cell-type
> marker list). The method estimates cell proportions from mixed sample
> expression data, given a set of markers (HaemAtlas), i.e. features that are
> known to be exclusively expressed by a single cell type in the mixture.
> Although, these analyses are completely dependent upon your marker lists.

This is straightforward.

>
> Now, that I have some reasonable cell-type frequencies I would like to
> explore the potential of  either:
>
> A) correcting these cell-type frequences, as within Combat
> B) using these frequences as continuous variables in a linear model.
>
> I don't find option B) anymore involved than that. Can you elaborate?

Well, we should first define your analysis goal more precisely. What
so you have in mind when you say "adjust gene expression for changes
in cell type composition"?

If you want cell-type specific expression, here's how the authors of
the paper I referred to earlier do it (and I agree): Gene expression
of a particular gene is presumably different in each cell type. So the
total expression of each gene is a sum of cell-type specific
expressions multiplied by the cell type abundances. This sort of looks
like a linear model, except that the coefficients multiplying the cell
type abundances are not constant - the cell type specific gene
expressions presumably also change between time points and it is these
changes people are usually after (this is why we need to define your
analysis goal more precisely).

To isolate the cell-type specific expression changes between time
points, you would then have to write a separate linear model at each
time point, figure out the cell-type specific expression as the
regression coefficient at each time point, then compare them (i.e.,
the regression coefficients). This of course assumes that at each time
point you have multiple samples, preferably many more than the number
of cell types you suspect you have in your sample, and the cell-type
specific expression of each gene within each cell type at each time
point can be considered constant.

If you simply adjust for cell type frequencies, it is not clear to me
how to interpret the resulting number.

Peter