[BioC] Advice with edgeR

Paul Leo p.leo at uq.edu.au
Wed Aug 5 07:00:24 CEST 2009


HI Mark ,
Yes that is very helpful, additionally I have found that

edgeR does NOT work in my hands with R2.10 and Bioc 2.5 (generated that
error) see below for sessionInfo

But the SAME code DOES work for
R2.9.1 Bioc2.4


> sessionInfo()
R version 2.9.1 (2009-06-26) 
x86_64-pc-linux-gnu 

locale:
LC_CTYPE=en_AU.UTF-8;LC_NUMERIC=C;LC_TIME=en_AU.UTF-8;LC_COLLATE=en_AU.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_AU.UTF-8;LC_PAPER=en_AU.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_AU.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods
base     

other attached packages:
[1] edgeR_1.2.4   Biobase_2.4.1

loaded via a namespace (and not attached):
[1] tcltk_2.9.1 tools_2.9.1

I Have not tried the svn version of edgeR with R2.10


Yes lane8 is a problem I am half thinking to killing it or try doing a
recall of the bases with a model approach.  The way this has been set up
is that there are 8100K regions from a chipseq experiment that contains
technical and biological replicates and a lots of potentially bad
regions have been included to cast a broad net- maybe too broad (love to
explicity include the effect of the technical replicate). I could kill
most of these additional regions off.. and check the common dispersion?

Can you share with me a rough feel for what is high/low group
dispersion ? I have no feel at the early stage what is out of the
ball-park and what is safe?


ALSO an additional question, if you permit...


On a conceptual level I was wondering if it makes sent to put count data
on a pedestal - for the explicit case of  performing differential
expression, in an analogous way that intensities in microarray data  can
be put of a pedestal (eg to make the fold changes look more uniform when
intensities are low).

In this instance the library size would confound that  as, I think,
adding the same constant to libraries of different sizes would not be
equivalent.

Have you found that putting the count data on a pedestal is at all
helpful or is that appropriate for this analysis  (again just for
differential expression)? I imagine that  the 
quantileAjust could be used to get pseudodata on the same library size
and then the constant could be added? Excuse my wild conjecturing...


Thanks again
Paul


-----Original Message-----
From: Mark Robinson <mrobinson at wehi.EDU.AU>
To: Paul Leo <p.leo at uq.edu.au>
Subject: Re: [BioC] Advice with edgeR
Date: Tue, 4 Aug 2009 15:28:42 +1000

Hi Paul.

Thanks for trying edgeR and giving your feedback.

I have seen this error before in certain situations.  There is a  
quantile adjustment step in there for the NB distribution and in  
difficult tails of the NB distribution, the quantile functions don't  
always return sensible results.  Ideally, we should have the code deal  
more gracefully for these.

I suspect the major factor for your data is the fact that lane8 has  
about 10-fold less total sequence than the rest of the lanes.  This  
probably pushes the limits of the quantile adjustment.  Is there a  
reason for this lane have so few reads?  I wonder if things would be  
better if you excluded that sample.

Another factor that may affect this is the variability.  The common  
dispersion seems quite high, although that may be simply related to  
the depth of sequencing you have.

Not sure I've been very insightful, but at least a few thoughts ...

Cheers,
Mark


On 03/08/2009, at 6:06 PM, Paul Leo wrote:

> I've just starting exploring edgeR. Gotten an error I am unsure of...
>
> I'm following the how-to with no changes
>
>> d
> DGEList:
> $data
>  lane3 lane8 lane5 lane6
> 1     0     0  1054   480
> 2    32     0   470   744
> 3     0     0   675   419
> 4     0     0   510   734
> 5   107     0   180  1138
> 81702 more rows ...
>
> $lib.size
> [1] 6482601  808363 7217381 7081739
>
> $group
> [1] ISOTYPE ISOTYPE CASE     CASE
> Levels: ISOTYPE CASE
>
>> alpha
> EBList:
> $alpha
> [1] 3.485649e-09
>
> $common.dispersion
> [1] 2.548059
>
> d contains no columns where all counts are zero but there are a lot of
> tags....
>> sum(apply(d$data,1,sum)==0)
> [1] 0
> running the following:
>
>
> chosen.lanes<-c("lane5","lane6","lane3","lane8")
> considered<-rownames(the.counts)[apply(the.counts[,chosen.lanes], 
> 1,sum)!
> =0
> d<-DGEList(data = the.counts[considered,chosen.lanes], group =
> c("CASE","CASE","ISOTYPE","ISOTYPE"),lib.size =
> total.counts[chosen.lanes,"count"])
> alpha<-alpha.approxeb(d)
> alpha
> ms<-deDGE(d,alpha=alpha$alpha)
>
> ##ERROR
> Calculating shrinkage overdispersion parameters.
> [quantileAdjust] Iteration (dot=1000) 1 :Error in approx(c(0, a +
> c(diff(a)/2, 0)), c(-0.5, 0:mx), xout = p[i,  :
>  approx(): attempted to interpolate NA values
>
> The error does not occur if I set doPoisson (yet) and I have a case
> where probably 1/2 of the "tags" are not differentially expressed.
>
> Any insight would be very welcome.
>
>
> Thanks
> Paul
>
>
>
>> sessionInfo()
> R version 2.10.0 Under development (unstable) (2009-07-24 r48986)
> x86_64-unknown-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C
> [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8
> [5] LC_MONETARY=C              LC_MESSAGES=en_AU.UTF-8
> [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C
> [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods
> base
>
> other attached packages:
> [1] edgeR_1.3.3   Biobase_2.5.4
>
> loaded via a namespace (and not attached):
> [1] tcltk_2.10.0 tools_2.10.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

------------------------------
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robinson at garvan.org.au
e: mrobinson at wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852



More information about the Bioconductor mailing list