[BioC] normalizing RNAseq with batch/block-level bulk DE

Thu Oct 25 22:39:17 CEST 2012

Hi Aaron

I would try this, although whether it worked and makes sense you will only know afterwards (or never :).

	Best wishes
	Wolfgang

Il giorno Oct 25, 2012, alle ore 10:28 PM, Aaron Mackey <amackey at virginia.edu> ha scritto:

> I meant that the experimental design contains both factors of interest, and nuisance factors, both of which contribute to variation across samples.  While the factors of interest may relate to a small number of gene/isoform changes (with potentially large magnitudes), the nuisance factor differences are far more abundant, though usually of much smaller magnitude.  I wish we had spike-ins, but we'll consider coming up with a category of "constant" genes.  What do you think about identifying such a list via an automated, iterative bootstrap: select 500 random genes with minimum coefficient of variation across the experiment (ignoring the design), estimateSizeFactors with these, then recalculate cpm; re-select the best 500 control genes and keep iterating until you stabilize the selected genes, and/or the size factors ... ?
> 
> Thanks,
> -Aaron
> 
> 
> On Thu, Oct 25, 2012 at 3:35 PM, Wolfgang Huber <whuber at embl.de> wrote:
> Dear Aaron
> 
> DESeq's variance stabilising transformation does not do normalisation.
> 
> By "deal with large scale differences in mean 'baseline' expression across experimental blocks" do you mean that you are considering a comparison between different biological conditions where you expect that a lot of gene expression levels are changed? The best here is to work with a set of negative control genes: these can either be spike-ins or a category of genes from which you know that they shouldn't change too much. Then, call 'estimateSizeFactors' only on the data of these, but apply to all data (by using the assignment function 'sizeFactors<-').
> 
>         Best wishes
>         Wolfgang.
> 
> Il giorno Oct 25, 2012, alle ore 5:43 PM, Aaron Mackey <amackey at virginia.edu> ha scritto:
> 
> > Is VST-normalization (a la DESeq) considered the right way to deal with
> > large scale differences in mean "baseline" expression across experimental
> > blocks?  Is there a normalization method that can take into account the
> > design matrix (or at least the batch/block columns)?  I don't want to
> > remove the batch/block effects, but TMM and friends all assume
> > near-constant expression across the design, which is violated by our
> > (nuisance) block-level differences in composition.  We see this when we
> > compare edgeR TMM-normalized log(cpm) to qRT-PCR data; the
> > TMM-normalization has smoothed out the block differences that the Ct values
> > still exhibit (though cpm and Ct are still strongly correlated, there is a
> > Ct "shift" for each different block that is not seen in the cpm).
> >
> > Thanks in advance for any insights/thoughts on the issue,
> > -Aaron
> >
> >       [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
>