[BioC] edgeR: a question about library size

Naomi Altman naomi at stat.psu.edu
Sun Jun 20 05:02:50 CEST 2010


I do not understand how we can do a multiplicative read count adjustment.

If K is Poisson(v) then E(K)=Var(K)=v.  If we do twice the sequencing 
effort, then E(K)=Var(K)=2v.  But if we multiple by 2, then E(2K)=2v 
and Var(2K)=4v.  So, how can
this type of adjustment work properly?

--Naomi

At 08:15 AM 6/17/2010, Mark Robinson wrote:
>Hi Raffaele.
>
>In my experience, you're better off with the number of mapped 
>reads.  But, a safer way is to do something data-driven.  For 
>example, TMM normalization (http://genomebiology.com/2010/11/3/R25) 
>is implemented in the calcNormFactors() function.  See also the docs 
>and the user's guide.
>
>Hope that helps.
>
>Cheers,
>Mark
>
>On 2010-06-17, at 10:00 PM, rcaloger wrote:
>
> > Hi,
> > I am using edgeR to detect differential expression in NGS experiments.
> > I have a brief question on what I should considered as "total size of my
> > libraries".
> > In my case I have a set of samples that have a quite large variation  in
> > the library size:
> >
> > Total reads Mapped reads
> >
> > 1 11076283 8736308
> >
> > 2 5881045 4006468
> >
> > 3 7139703 5108608
> >
> > 4 9089153 5643701
> >
> > 5 9723103 8457914
> >
> > 6 15570265 8706332
> >
> > 7 15844448 12056310
> >
> > 8 13375681 8663496
> >
> > 9 14997114 8799752
> >
> > 10 15744584 8555922
> >
> > 11 4642056 3201515
> >
> > 12 6458028 4277204
> >
> > 13 13206724 9466118
> >
> > 14 3035032 2148730
> >
> >
> > Should I insert as lib.size parameter the values referring to the real
> > size of the libraries (Total reads) or
> > simply the size of the mapped reads (Mapped reads)
> >
> > Thanks for the help
> > Raffaele
> >
> > --
> >
> > ----------------------------------------
> > Prof. Raffaele A. Calogero
> > Bioinformatics and Genomics Unit
> > Dipartimento di Scienze Cliniche e Biologiche
> > c/o Az. Ospedaliera S. Luigi
> > Regione Gonzole 10, Orbassano
> > 10043 Torino
> > tel.   ++39 0116705417
> > Lab.   ++39 0116705408
> > Fax    ++39 0119038639
> > Mobile ++39 3333827080
> > email: raffaele.calogero at unito.it
> >        raffaele[dot]calogero[at]gmail[dot]com
> > www:   http://www.bioinformatica.unito.it
> > Info: http://publicationslist.org/raffaele.calogero
> >
> >
> >       [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>------------------------------
>Mark Robinson, PhD (Melb)
>Epigenetics Laboratory, Garvan
>Bioinformatics Division, WEHI
>e: m.robinson at garvan.org.au
>e: mrobinson at wehi.edu.au
>p: +61 (0)3 9345 2628
>f: +61 (0)3 9347 0852
>------------------------------
>
>
>
>
>
>
>______________________________________________________________________
>The information in this email is confidential and intend...{{dropped:6}}
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives: 
>http://news.gmane.org/gmane.science.biology.informatics.conductor

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111



More information about the Bioconductor mailing list