[BioC] RNA-Seq, generate batch-free count matrix

davide risso risso.davide at gmail.com
Mon Jul 21 18:36:56 CEST 2014


Hi Brian,

as mentioned by Dario the RUVSeq package has a way to deal with your situation.
The typical application of RUVSeq is differential expression, hence
"adding" the batch effect in the generalized linear model rather than
correcting the counts for the batch effects.

However, the RUVSeq functions return a matrix of normalized counts
that are "batch effect free". The main risk is that you remove (part
of) the signal of interest when removing the batch effects (especially
if the two are correlated). On the other hand, if the batch effect and
the signal of interest are "not too correlated" RUVSeq will give you
exactly what you want.

If you have replicate samples, we found that in practice the
"replicate method" (function RUVs) works much better than the
"negative controls" method (function RUVg) when dealing with
unsupervised problems.

I hope this helps.

Best,
davide

On Sat, Jul 19, 2014 at 8:01 PM, Brian Haas <bhaas at broadinstitute.org> wrote:
> Greetings all,
>
> I've been researching ways to remove batch effects from RNA-Seq count
> matrices.  Basically, I'm starting with a counts matrix that includes batch
> effects, and want to generate a new matrix of counts that has the batch
> effects removed.
>
> I'm looking to apply this to sets of RNA-Seq samples (~100 samples) that
> were sequenced in batches on different days (factor) and for which I also
> have other metadata with continuous values (covariates such as total
> sequenced reads in each sample, quality metrics, etc).   I want to study
> all these samples in an unsupervised manner, and don't have a model for
> anything but the various batch effects that I want removed (ie. no cancer
> vs. normal labeling, instead they're all 'normal' and I'd like to see if
> they form clusters based on natural variation in the population, and
> perhaps identify subtypes).
>
> >From what I've read thus far, methods like sva (and the included Combat)
> require that you provide a model for the covariates that you do not want
> removed (biological factors) in addition to the ones you do want removed
> (batch effects).   Is it not possible to use these methods in my scenario,
> where I don't have factors other than the specified batch effects?
>
> In searching the bioconductor mailing list archive, I found:
>
>        edgeR package, removeBatchEffect() function
>
> which seems to do exactly what I want, and I'll experiment with it shortly.
>  I'm mostly curious about what other methods might be available to do this,
> and whether the SVA or other libraries contain functions that I should
> explore.
>
> Many thanks in advance for any advice!
>
> ~brian
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



-- 
Davide Risso, PhD
Post Doctoral Scholar
Department of Statistics
University of California, Berkeley
344 Li Ka Shing Center, #3370
Berkeley, CA 94720-3370
E-mail: davide.risso at berkeley.edu



More information about the Bioconductor mailing list