[BioC] How to look at the output of the Poisson data model
Mark Robinson
mrobinson at wehi.EDU.AU
Fri Oct 3 03:09:19 CEST 2008
Hi Milena.
See comments below.
> When fitting the moderated negative binomial model, if alpha is very
> small, does it indicate that the model is not suitable?
>
> With my data, I got alpha=1.632020e-08.
Not at all. alpha is analogous to the prior degrees of freedom in a
limma analysis (but not on such a nice scale). Basically, alpha
controls how much moderation is done. There is a lot more to say
about this, but its best to read the Bioinformatics paper and ask me
if you have any questions.
What might be of interest is the smoothed dispersion estimates. For
example:
1/ms$r[1:100] # first 100 dispersion estimates
If these are all close to 0, then thats a good indication that the NB
is basically just Poisson ( .... since Poisson is the special case
where the dispersion=0).
> Continued with the process and the result of topTags() is that all
> tags come up with a P.Value=0 and adj.P.Val=0.
> Additionally, when plotting an MAplot, blue dots fall around the
> middle of the scatter, around an M value of 0, rather than on the
> edges of the scatter.
> (see code below)
I bet these are all very low total counts. It appears that this is a
bug, but its not something I've come across in my own tests, so I
don't know what is causing it. Do you mind making your data available
for me? Feel free to add dummy rownames/colnames so I don't know what
the data is. Email me offline if you can.
> Would this reflect that the negative binomial is not the right
> model? or more likely this would be an artifact or me doing
> something wrong?
>
> In contrary when I fit the Poisson model and subsequently do an
> MAplot, it looks more like what I would have expected from an array,
> where the highlighted tags locate to the edges of the scatter, away
> from an M value of 0. Would this mean that the Poisson distribution
> is a better fit?
Not necessarily. Poisson versus NB is a question of whether there is
'extra-Poisson' variation? i.e. more variability than explained by
the model. You won't be able to tell that from alpha values or MA
plots. There are goodness of fit tests for this but I don't know how
well they'll work in small samples ... perhaps a mean-variance plot of
the pseudo data would be a good starting point.
My observation has been that for SAGE data with *biological*
replicates there is definitely more variation than explained by
Poisson (i.e. go with NB, moderation of dispersion is helpful), but
next gen sequencing on technical replicates looks Poisson. I'd be
happy to hear other people's impressions.
Cheers,
Mark
------------------------------
Mark Robinson
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robinson at garvan.org.au
e: mrobinson at wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
More information about the Bioconductor
mailing list