[BioC] Limitations in edgeR?

Wed Apr 2 20:43:30 CEST 2014

Hi,

On Wed, Apr 2, 2014 at 9:58 AM, Eleanor Su <eleanorjinsu at gmail.com> wrote:
> Hi Steve,
>
> I'm running the same analysis on both datasets (the larger and the
> smaller). When I rerun the analysis on the smaller dataset (which actually
> IS half of the identities from the larger data set), I come across an error
> message when estimating glm trended dispersion.

Is it the same data as the bigger data set but just cut in half? Or is
this a separate dataset where you only measured half of the things
from a previous experiment? Do you know what I mean?

If it's the former, what happens when you take the other half? :-)

The error you get is hard for me to debug as I'm not familiar with the
internals of the code working there and what edge cases that it might
be susceptible to and all I could really do is provide (rather non
insightful) strategies to help smoke out potential problems in your
data itself ... could you take various subsets of your data to see if
you can sidestep the problem?

Also, are you running on bioc2.13 (R 3.0.3) and the latest version of
edgeR? (providing the output of sessionInfo() at the end of posts
looking for help is usually a good idea)

If you are not running the latest and greatest, upgrade.

Also, from the looks of your design matrix: is it true that you don't
have any replication? Was it the same exact design matrix that worked
your previous dataset?

And neither here nor there, is "piRNAtotalcount>10.txt" really the
name of your file? Although it may work for you (are you on windows?)
it's not a bad idea to get into the habit of using more boring
filenames (">" is the redirection operator on linux).

HTH,
-steve

> Here are the commands I'm
> using:
>
>> rawdata<-read.delim("piRNAtotalcount>10.txt", check.names=FALSE,
> stringsAsFactors=FALSE)
>> y <- DGEList(counts=rawdata[,2:11], genes=rawdata[,1])
>> Family<-factor(c(6,6,9,9,11,11,26,26,28,28))
>> Treatment<-factor(c("C","H","C","H","C","H","C","H","C","H"))
>> data.frame(Sample=colnames(y),Family,Treatment)
>    Sample Family Treatment
> 1      6C      6         C
> 2      6H      6         H
> 3      9C      9         C
> 4      9H      9         H
> 5     11C     11         C
> 6     11H     11         H
> 7     26C     26         C
> 8     26H     26         H
> 9     28C     28         C
> 10    28H     28         H
>> design<-model.matrix(~Family+Treatment)
>> rownames(design)<-colnames(y)
>> y<-estimateGLMTrendedDisp(y,design)
> Error in optim(par0, fun, y = y.nonzero[i, ], design = design, offset =
> offset.nonzero[i,  :
>       function cannot be evaluated at initial parameters
>
> I only encounter this error when running the smaller dataset.
>
> Best,
> Eleanor

-- 
Steve Lianoglou
Computational Biologist
Genentech