[BioC] BaseCounts & edgeR

Marco Groth mgroth at fli-leibniz.de
Tue Jan 31 10:29:19 CET 2012


Dear Bioconductor team,

I am using DESeq and edgeR for analysis of count data and find 
differentially expressed genes and btw it worked very well. As input 
data I created read count. For that I used also Illumina's CASAVA and 
GenomeStudio. Unfortunately, Illumia changed the counting procedure. As 
of version 1.8 read counting is not possible anymore, they changed 
finally to base counting. Means all mappable bases which fulfil a given 
quality score will be counted. So by using high quality 50bp reads for 
mapping the counts will be around 50x higher compared to the read count 
method.
By using the base counts in edgeR the number of differentially expressed 
genes is dramatically reduced. We normally compare DESeq and edgeR 
results and they fit around 80%. Using the base counts the results do 
not fit anymore.
I divided all base counts by 50 to approximate the read count and the 
results look better, but not as good as before. Furthermore, I get 
uncertainties because of counting in splice site is more sophisticated.
Nevertheless, my question is whether I can run edgeR using base counts 
and getting good results?

Thanks, Marco

-- 

Marco Groth



More information about the Bioconductor mailing list