[BioC] How to look at the output of the Poisson data model
Milena Gongora
m.gongora at imb.uq.edu.au
Fri Oct 3 02:15:43 CEST 2008
Hello Mark,
Excellent news. Thanks a lot for implementing it so quickly!
I used the Poisson model because I suspected that it might fit the data
better, as the negative binomial has given me somewhat unexpected
results. Maybe I can ask for your advice again....
When fitting the moderated negative binomial model, if alpha is very
small, does it indicate that the model is not suitable?
With my data, I got alpha=1.632020e-08.
Continued with the process and the result of topTags() is that all tags
come up with a P.Value=0 and adj.P.Val=0.
Additionally, when plotting an MAplot, blue dots fall around the middle
of the scatter, around an M value of 0, rather than on the edges of the
scatter.
(see code below)
Would this reflect that the negative binomial is not the right model? or
more likely this would be an artifact or me doing something wrong?
In contrary when I fit the Poisson model and subsequently do an MAplot,
it looks more like what I would have expected from an array, where the
highlighted tags locate to the edges of the scatter, away from an M
value of 0. Would this mean that the Poisson distribution is a better fit?
Many thanks,
Milena
#--------------------
grouping <- c(1,1,2,2)
lib_size <- as.numeric(apply(mydata,2,sum))
lib_size
324031 527758 370766 658864
d <- new("DGEList", list(data=as.matrix(mydata), group=grouping,
lib.size=lib_size))
head(d$data)
A.REP1 A.REP2 B.REP1 B.REP2
100168004_100169791_- 2 4 3 1
100175732_100176961_- 0 3 2 3
100471575_100982094_- 0 0 0 4
101147427_101152326_- 1 0 0 4
101147433_101152326_- 3 1 1 2
101152469_101153215_- 4 21 8 46
alpha <- alpha.approxeb(d)
alpha
EBList: alpha=1.632020e-08
ms <- deDGE(d, alpha=alpha$alpha)
ms
adj.p <- p.adjust(ms$exact, "fdr")
k <- (adj.p < 0.05)
plot((log2(ms$ps$p1) + log2(ms$ps$p2))/2, log2(ms$ps$p2/ms$ps$p1),
pch=19, xlab="A", ylab="M", col=c("black", "blue")[k+1])
boxplot(as.data.frame(sqrt(d$data)))
topTags(ms)
A M P.Value adj.P.Val
100471575_100982094_- -59.08348 31.88121 0 0
101943878_101944171_- -41.62854 -33.39554 0 0
102221513_102227940_- -42.43538 -32.58870 0 0
103547901_103548588_- -58.27195 33.50425 0 0
103859430_103859667_- -42.18712 -32.83696 0 0
104130431_104130871_+ -58.42573 33.19671 0 0
104146565_104146648_+ -58.67857 32.69102 0 0
105193815_105194263_+ -42.33242 -32.69166 0 0
111619777_111620478_- -42.56783 -32.45625 0 0
115429727_115447242_+ -42.28483 -32.73925 0 0
#--------------------
Mark Robinson wrote:
> Hi Milena.
>
> Thank you for your comments. I had originally added the Poisson special
> case as somewhat of an afterthought. But, it appears that it is
> worthwhile to have it covered by the 'deDGE' command itself. So, ...
>
> I've made the following changes:
>
> 1. In the 'deDGE' command, there is now an argument 'doPoisson', which
> when set to TRUE skips the dispersion moderation step, but still
> calculates the 'exact' test on the quantile-adjusted data. So, the
> Poisson analysis returns an object of the correct type for 'topTags'.
>
> 2. I've added a 'plotMA' method for 'deDGEList' objects, similar to
> limma's ... it saves some of the mucking about forming M and A that was
> previously done by hand.
>
> 3. Vignette updated.
>
> Sorry I hadn't thought of doing this sooner!
>
> Cheers,
> Mark
>
>
>
>
>
>> Dear Mark (and edgeR users),
>>
>> Thanks so much for your feedback. That solved the problem and also was
>> good advice regarding the low tag counts.
>>
>> I am facing another obstacle:
>> I tried fitting the Poisson model instead of the negative binomial. The
>> MAplot and boxplots from this method look good.
>> Now my problem is, how do I get a table with the resulting
>> differentially expressed tags of out of the Poisson model?
>> topTags() does not work on the object resulting from exactTestNB(),
>> because it is not a deDGEList...
>>
>> My full code is below.
>> Thanks again!
>>
>> #-----------------------
>>
>> d <- new("DGEList", list(data=as.matrix(mydata), group=grouping,
>> lib.size=lib_size))
>> d
>> head(d$data)
>> A.REP1 A.REP2 B.REP1 B.REP2
>> 100168004_100169791_- 2 4 3 1
>> 100175732_100176961_- 0 3 2 3
>> 100471575_100982094_- 0 0 0 4
>> 101147427_101152326_- 1 0 0 4
>> 101147433_101152326_- 3 1 1 2
>> 101152469_101153215_- 4 21 8 46
>>
>> qA <- quantileAdjust(d, r.init=1000, n.iter=1)
>>
>> par(mfrow=c(1,2))
>> boxplot(as.data.frame(sqrt(d$data)))
>> boxplot(as.data.frame(sqrt(qA$pseudo)))
>>
>> f <- exactTestNB(qA$pseudo, d$group, qA$N * qA$ps$p, r=1000)
>>
>> #------------------------
>>
>> Mark Robinson wrote:
>>
>>> Hi Milena.
>>>
>>> Small oversight on my part. If you use 'as.matrix(mydata)' in place of
>>> 'mydata', you should have no problems. This is imposed on the data
>>> matrix
>>> in the next version (which might take a day or so to come online).
>>>
>>> A couple of other small points:
>>>
>>> 1. You may want to use the new constructor (instead of creating the list
>>> yourself):
>>>
>>> d<-DGEList(data=y,group=grouping,lib.size=lib_size)
>>>
>>> 2. In terms of analysis of differential expression, there may not be
>>> much
>>> value in the rows of your table that have VERY low counts (e.g. your
>>> 100134814_100136947). In the past, I've noticed some instabilities in
>>> the
>>> NB calculations with very low counts and I've generally filtered out the
>>> rows with (say) 3 or less total counts.
>>>
>>> Cheers,
>>> Mark
>>>
>>>
>>>
>>>
>>>
>>>> Dear edgeR developers and users,
>>>>
>>>> Just started using edgeR with next generation sequence (count) data.
>>>> When calculating alpha, I am getting the following error:
>>>>
>>>> alpha <- alpha.approxeb(d)
>>>> [quantileAdjust] Iteration (dot=1000) 1 :Error in y1/matrix(rep(1,
>>>> nrow(y1)), ncol = 1) %*% lib.size1 :
>>>> non-numeric argument to binary operator
>>>>
>>>> I can't work out why I am getting this... any ideas?
>>>> Thanks!
>>>>
>>>> The full code is below:
>>>> --------------
>>>> mydata <- read.table("myFile.txt", header=TRUE, row.names="ID")
>>>> head(mydata)
>>>> A.REP1 A.REP2 B.REP1 B.REP2
>>>> 100011872_100012727_- 0 0 0 2
>>>> 100017569_100017878_- 1 0 0 0
>>>> 100134814_100136947_- 0 0 0 0
>>>> 100134931_100136947_- 0 2 0 0
>>>> 100137054_100138100_- 1 0 0 1
>>>> 100137831_100138100_- 1 0 0 0
>>>>
>>>> grouping <- c(1,1,2,2)
>>>> lib_size <- as.numeric(apply(mydata,2,sum))
>>>> lib_size
>>>> [1] 352812 573571 401573 719698
>>>>
>>>> d <- new("DGEList", list(data=mydata, group=grouping,
>>>> lib.size=lib_size))
>>>> alpha <- alpha.approxeb(d)
>>>> [quantileAdjust] Iteration (dot=1000) 1 :Error in y1/matrix(rep(1,
>>>> nrow(y1)), ncol = 1) %*% lib.size1 :
>>>> non-numeric argument to binary operator
>>>> ---------------------
>>>>
>>>> --
>>>> Milena
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor-
>>>>
>>>>
>> Milena
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>>
>>
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Milena
More information about the Bioconductor
mailing list