[BioC] edgeR question tags per million

Mark Robinson mrobinson at wehi.EDU.AU
Wed Jun 16 14:17:59 CEST 2010


Hi David.

We generally recommend raw counts as input to edgeR.  Presumably you can
back to this by reversing the TPM calculation --  multiply the TPM in a
library by the total number of reads and divide by 1 million.  I presume
this info is available.

That is some seriously high dispersion.  You may also want to look into
normalization -- see ?calcNormFactors or see the manual.

Cheers,
Mark

> Hi,
> I'm going through a sequencing dataset (GSE21630).
> It's a 46 different libraries experiment with a large dispersion:
>
>  > d$common.dispersion
> [1] 3.101696
>
>  > sqrt(d$common.dispersion)
> [1] 1.761163
>
> I guess i need to cope with that as the sequencing libraries were
> prepared with an old protocol not as efficient as current 1.5 solexa
> protocol.
>
> With thar being said the librarires are given in tags per 1 million
> (TPM). On average the experiment had 4 million reads per library. I was
> wondering how edge analysis would be affected by using reads (sort of
> normalized to 1million size library (TPM)) rather then using the total
> number of reads.
>
> What do you think ?? Should i multiply 4 the current number of reads
> given in TPM or can i stay with my libraries at 1million reads all.
>
> thanks,
> david
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list