[BioC] Limitations in edgeR?

Fri Apr 4 01:45:55 CEST 2014

Dear Eleanor,

Well, a couple of comments.

First, edgeR does not have a limitation on the number of genes it can 
run on.

I suggest that you upgrade the most recent version of edgeR, which I 
suspect you do not have, and run

   y <- estimateDisp(y,design)

Second, given that you have already analyzed the full set of piRNAs 
successfully, why in the world would you need to rerun the analysis on 
just half of them?  This does seem like a self-inflicted problem.

Gordon

> Date: Wed, 2 Apr 2014 09:58:23 -0700
> From: Eleanor Su <eleanorjinsu at gmail.com>
> To: Steve Lianoglou <lianoglou.steve at gene.com>
> Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
> Subject: Re: [BioC] Limitations in edgeR?
>
> Hi Steve,
>
> I'm running the same analysis on both datasets (the larger and the
> smaller). When I rerun the analysis on the smaller dataset (which actually
> IS half of the identities from the larger data set), I come across an error
> message when estimating glm trended dispersion. Here are the commands I'm
> using:
>
>> rawdata<-read.delim("piRNAtotalcount>10.txt", check.names=FALSE,
> stringsAsFactors=FALSE)
>> y <- DGEList(counts=rawdata[,2:11], genes=rawdata[,1])
>> Family<-factor(c(6,6,9,9,11,11,26,26,28,28))
>> Treatment<-factor(c("C","H","C","H","C","H","C","H","C","H"))
>> data.frame(Sample=colnames(y),Family,Treatment)
>   Sample Family Treatment
> 1      6C      6         C
> 2      6H      6         H
> 3      9C      9         C
> 4      9H      9         H
> 5     11C     11         C
> 6     11H     11         H
> 7     26C     26         C
> 8     26H     26         H
> 9     28C     28         C
> 10    28H     28         H
>> design<-model.matrix(~Family+Treatment)
>> rownames(design)<-colnames(y)
>> y<-estimateGLMTrendedDisp(y,design)
> Error in optim(par0, fun, y = y.nonzero[i, ], design = design, offset =
> offset.nonzero[i,  :
>      function cannot be evaluated at initial parameters
>
> I only encounter this error when running the smaller dataset.
>
> Best,
> Eleanor
>
>
>
> On Wed, Apr 2, 2014 at 9:49 AM, Steve Lianoglou <lianoglou.steve at gene.com>wrote:
>
>> Hi Eleanor,
>>
>> On Tue, Apr 1, 2014 at 11:09 AM, Eleanor Su <eleanorjinsu at gmail.com>
>> wrote:
>>> Hi All,
>>>
>>> I'm currently trying to analyze differential expression of piRNAs in some
>>> small data sets but am coming across issues that I didn't before when I
>>> analyzed with a larger data set. The larger data set contained 324 piRNA
>>> identities while the smaller data set contained half as many piRNA
>>> identities. Is there a minimum number of gene identities required in
>> order
>>> to analyze differential expression in edgeR?
>>
>> It's hard to help without knowing what the issues are that you are
>> running into, so ... what's going wrong?
>>
>> One way you could explore this question yourself is to use the larger
>> (324 piRNA) dataset that "went well" and simply take half of the data
>> from it and rerun the same analysis on the smaller set. Do you get
>> different results?
>>
>> While you're playing with that idea, please provide a follow up email
>> with more specific details about what the issues are that you are
>> running into with your new (smaller) dataset.
>>
>> HTH,
>> -steve
>>
>> --
>> Steve Lianoglou
>> Computational Biologist
>> Genentech

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}