[BioC] mRNA-seq cross-species analysis, is it possible?

Fri Aug 5 07:58:25 CEST 2011

Hi Mali

> So if I understand correctly, after I add the offset to the DGEList, I continue with estimating common dispersion (so skipping the calcNormFactors step).

Yes, that is correct. If you add your own offset to the DGEList then
the normalization factors computed by calcNormFactors are not used.
(d$offset is always used, when not NULL, ahead of
d$samples$norm.factors for the offset in the software). Once you've
added the offsets to the DGEList object your proceed with dispersion
estimation and DE testing as usual.

> And if so, would you suggest to do quantile normalization after adjusting the read counts to gene length, and before estimating the common dispersion?

Quantile normalization (which could itself account for gene lengths)
may or may not be a sensible thing to do depending on your particular
experiment and data. I can't really advise sensibly on whether you
should or shouldn't do it. You can always try and see if it makes any
sense for your dataset.

Whatever normalization you do, it must be done before estimating the
common dispersion or carrying out any further downstream inference on
DE.

Best wishes

Davis

---------- Forwarded message ----------
From: mali salmon <shalmom1 at gmail.com>
To: Kasper Daniel Hansen <kasperdanielhansen at gmail.com>
Date: Thu, 4 Aug 2011 09:00:45 +0300
Subject: Re: [BioC] mRNA-seq cross-species analysis, is it possible?
Thanks Davis and Kasper for your reply.
So if I understand correctly, after I add the offset to the DGEList, I
continue with estimating common dispersion (so skipping the calcNormFactors
step).
And if so, would you suggest to do quantile normalization after adjusting
the read counts to gene length, and before estimating the common dispersion?
Mali

On Thu, Aug 4, 2011 at 2:40 AM, Kasper Daniel Hansen <
kasperdanielhansen at gmail.com> wrote:

> On Wed, Aug 3, 2011 at 7:09 PM, Davis McCarthy <dmccarthy at wehi.edu.au>
> wrote:
>
> > We have used this gene-specific normalization factor to try out things
> like quantile normalization on RNA-Seq data in house. To my knowledge, the
> new cqn package outputs gene-specific offsets that will plug in to edgeR to
> normalize data for (possibly among other things) gene length and GC bias.
>
> True, but the interface to the cqn normalization method assumes that
> the data is ordered in a "genes" by samples matrix and that all
> samples have the same length/gc content for a given gene.  The data we
> are discussing does not fit into this framework.  However, it might be
> possible to hack the function to deal with this, by running
>  cqn(..., sqn = FALSE)
> on each species separately, combining the output and then do a custom
> sqn normalization of the combined residuals.  Hmm, this clearly
> requires more than 60 secs. of thinking (and probably some careful
> looking at the output to get an idea of whether this gives useful
> output or makes things worse).
>
> Kasper
>

       [[alternative HTML version deleted]]

---------------------------------------------------------------------------
Davis J McCarthy
Research Technician
Bioinformatics Division
Walter and Eliza Hall Institute of Medical Research
1G Royal Parade, Parkville, Vic 3052, Australia
dmccarthy at wehi.edu.au
http://www.wehi.edu.au