[BioC] edgeR question tags per million

Wed Jun 16 11:19:00 CEST 2010

Hi,
I'm going through a sequencing dataset (GSE21630).
It's a 46 different libraries experiment with a large dispersion:

 > d$common.dispersion
[1] 3.101696

 > sqrt(d$common.dispersion)
[1] 1.761163

I guess i need to cope with that as the sequencing libraries were 
prepared with an old protocol not as efficient as current 1.5 solexa 
protocol.

With thar being said the librarires are given in tags per 1 million 
(TPM). On average the experiment had 4 million reads per library. I was 
wondering how edge analysis would be affected by using reads (sort of 
normalized to 1million size library (TPM)) rather then using the total 
number of reads.

What do you think ?? Should i multiply 4 the current number of reads 
given in TPM or can i stay with my libraries at 1million reads all.

thanks,
david