[BioC] EdgeR: getting CPM values after batch effect correction

Ryan C. Thompson rct at thompsonclan.org
Thu Aug 29 09:00:43 CEST 2013


Hi Gilgi,

All you have to do is remove "batch" from the formula in your call to 
"model.matrix", and using the resulting matrix in your call to 
removeBatchEffect. Note, however, that for differential expression 
testing, you want to use the design matrix that you're already using and 
avoid removeBatchEffect.

For combining batch effect removal with the variance stabilizing 
transformation, I believe you would first apply the VST to get 
normalized, variance-stabilized logCPM, and then use removeBatchEffect 
on the result.

-Ryan

P.S. Please use "Reply All" when you reply to posts in order to ensure 
that the whole mailing list sees your reply.

On 08/28/2013 11:08 PM, Gilgi Friedlander wrote:
> Hi Ryan,
>
> Thanks a lot for your reply.
>
> The reason I used estimate dispersions is that I wanted to do also differential expression analysis.
>
> Regarding the removeBatchEffect - I am not sure I understand how to define the design and then how to define the batch. Do you happen to have an example how to use it?
>
> Thanks for the variance-stabilizing transformation tip. Would it be correct to pass it as the matrix given to removeBatchEffect?
>
> Thanks a lot,
> Gilgi
>   
> -----Original Message-----
> From: Ryan C. Thompson [mailto:rct at thompsonclan.org]
> Sent: Wednesday, August 28, 2013 11:02 AM
> To: Gilgi Friedlander
> Cc: bioconductor
> Subject: Re: [BioC] EdgeR: getting CPM values after batch effect correction
>
> Hi Gilgi,
>
> If you're just using edgeR for the CPM calculation, I believe there's no need to estimate dispersions; they won't be used in calculating the CPM values. You may wish to use normalized.lib.sizes=TRUE in the call to cpm to get CPM values that take into account the normalization factors computed by calcNormFactors (otherwise there's no point in doing that either).
>
> Second, according to the help text for removeBatchEffect, you're not supposed to include the batch effect in the design matrix. The design matrix should include all the experimental variables, while the batch variable should indicate the technical batching.
>
> Finally, instead of doing a simple logCPM transform, you might also try the variance-stabilizing transformation provided by the DESeq package, which is intended for clustering and machine learning types of analyses.
>
> -Ryan
>
> On 08/27/2013 10:52 PM, Gilgi Friedlander wrote:
>> Hi Ryan,
>>
>> Thanks a lot for the reply.
>>
>> I followed EdgeR user's manual, and defined a model:
>> y1<-DGEList(counts=countdata1,group=batch)
>> y1<- calcNormFactors( y1 )
>> design1 <- model.matrix(~batch+Treat1)
>>
>> batch has values 1 or 2, according to the batch of the experiment that was done.
>>
>> Treat has 10 different samples.
>>
>> In order to define a contrast I did the following:
>> y1 <- estimateGLMCommonDisp(y1, design1, verbose=TRUE)#Now we are
>> ready to construct an edgeR specific
>> y1 <- estimateGLMTrendedDisp(y1, design1)
>> y1 <- estimateGLMTagwiseDisp(y1, design1) lrt <-
>> glmLRT(fit,contrast=c(0,0,1,-1,0,0,0,0,0,0,0,0,0,0))
>>
>> I want now to get also the log counts after removal of the batch effect (for the purpose of clustering of the genes).
>>
>> Is it correct to obtain the batch removed log counts in the following way:
>>
>> logCPM <- cpm(y1, log=TRUE, prior.count=3)
>> logCPM_batchRemoved<-removeBatchEffect(logCPM,batch=batch,design=desig
>> n1)
>>
>> Many thanks,
>> Gilgi
>>
>> -----Original Message-----
>> From: Ryan [mailto:rct at thompsonclan.org]
>> Sent: Wednesday, August 28, 2013 2:46 AM
>> To: Gilgi Friedlander
>> Cc: bioconductor at r-project.org
>> Subject: Re: [BioC] EdgeR: getting CPM values after batch effect
>> correction
>>
>> You would have to define batch correction. Are you talking about fitting a model of the form "~ experimetalVar + batchEffect" and then subtracting out the batch effect coefficient?
>>
>> On Sun Aug 25 11:11:55 2013, Gilgi Friedlander wrote:
>>> Dear list,
>>>
>>> In edgeR, it possible to get CPM values after batch effect correction (and after TMM normalization)?
>>>
>>> Thanks a lot!
>>>
>>> 	[[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list