[BioC] Remove batch effect in small RNASeq study (SVA or others?)

Gordon K Smyth smyth at wehi.EDU.AU
Mon Apr 28 12:31:25 CEST 2014


> Date: Sun, 27 Apr 2014 20:32:55 -0400
> From: shirley zhang <shirley0818 at gmail.com>
> To: Gordon K Smyth <smyth at wehi.edu.au>
> Cc: Bioconductor mailing list <bioconductor at r-project.org>
> Subject: Re: [BioC] Remove batch effect in small RNASeq study (SVA or
> 	others?)
>
> Dear Dr. Smyth,
>
> Thank you very much for your quick reply. I did as you suggested by first
> getting log CPM value, then call removeBatchEffect(). I found the PCA
> figure looks better than before, but there is still a batch effect.

I don't see how there could still be a batch effect.

Please give the code sequence you used to remove the batch effect and to 
make the PCA plot.

> I attached two PCA figures. One is based on log10(raw count) which is
> before calling cpm() and removeBatchEffect(). Another one is after.
>
> Could you look at them and give me more suggestions. Will a quantile
> normalization across samples be a good idea since CPM() is still a
> normalization only within each sample??

You should normalize the data before using removeBatchEffect(), and 
quantile is one possibility.

Gordon

> Thanks again for your help,
> Shirley
>
>
> On Sun, Apr 27, 2014 at 6:54 PM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>
>> Dear Shirley,
>>
>> I would probably do it like this:
>>
>>   library(edgeR)
>>   logCPM <- cpm(y,log=TRUE,prior.count=5)
>>   logCPM <- removeBatchEffect(logCPM, batch=batch)
>>
>> Best wishes
>> Gordon
>>
>>  Date: Sat, 26 Apr 2014 10:51:23 -0400
>>> From: shirley zhang <shirley0818 at gmail.com>
>>> To: Bioconductor Mailing List <bioconductor at stat.math.ethz.ch>
>>> Subject: [BioC] Remove batch effect in small RNASeq study (SVA or
>>>         others?)
>>>
>>> I have a RNASeq paired-end data from two batches (8 samples from batch1,
>>> and 7 samples from batch2). After alignment using TopHat2, then I got
>>> count
>>> using HTseq-count, and FPKM value via Cufflinks. A big batch effect can be
>>> viewed in PCA using both log10(raw count) and log10(FPKM) value.
>>>
>>> I can NOT use the block factor in edgeR to remove batch effect since I
>>> need
>>> to first obtain residuals after adjusting for batch effect, then test the
>>> residuals for hundreds of thousands of SNPs (eQTL analysis).
>>>
>>> My question is how to remove batch effect without using edgeR:
>>>
>>> 1. is SVA ok for such a small sample size (N=15)?
>>> 2. If SVA does not work, any other suggestions?
>>>
>>> Many thanks,
>>> Shirley

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list