[BioC] How to look at the output of the Poisson data model

Fri Oct 3 03:09:19 CEST 2008

Hi Milena.

See comments below.

> When fitting the moderated negative binomial model, if alpha is very  
> small, does it indicate that the model is not suitable?
>
> With my data, I got alpha=1.632020e-08.

Not at all.  alpha is analogous to the prior degrees of freedom in a  
limma analysis (but not on such a nice scale).  Basically, alpha  
controls how much moderation is done.  There is a lot more to say  
about this, but its best to read the Bioinformatics paper and ask me  
if you have any questions.

What might be of interest is the smoothed dispersion estimates.  For  
example:

1/ms$r[1:100]  # first 100 dispersion estimates

If these are all close to 0, then thats a good indication that the NB  
is basically just Poisson ( .... since Poisson is the special case  
where the dispersion=0).

> Continued with the process and the result of topTags() is that all  
> tags come up with a P.Value=0 and adj.P.Val=0.
> Additionally, when plotting an MAplot, blue dots fall around the  
> middle of the scatter, around an M value of 0, rather than on the  
> edges of the scatter.
> (see code below)

I bet these are all very low total counts.  It appears that this is a  
bug, but its not something I've come across in my own tests, so I  
don't know what is causing it.  Do you mind making your data available  
for me?  Feel free to add dummy rownames/colnames so I don't know what  
the data is.  Email me offline if you can.

> Would this reflect that the negative binomial is not the right  
> model? or more likely this would be an artifact or me doing  
> something wrong?
>
> In contrary when I fit the Poisson model and subsequently do an  
> MAplot, it looks more like what I would have expected from an array,  
> where the highlighted tags locate to the edges of the scatter, away  
> from an M value of 0. Would this mean that the Poisson distribution  
> is a better fit?

Not necessarily.  Poisson versus NB is a question of whether there is  
'extra-Poisson' variation?  i.e. more variability than explained by  
the model.  You won't be able to tell that from alpha values or MA  
plots.  There are goodness of fit tests for this but I don't know how  
well they'll work in small samples ... perhaps a mean-variance plot of  
the pseudo data would be a good starting point.

My observation has been that for SAGE data with *biological*  
replicates there is definitely more variation than explained by  
Poisson (i.e. go with NB, moderation of dispersion is helpful), but  
next gen sequencing on technical replicates looks Poisson.  I'd be  
happy to hear other people's impressions.

Cheers,
Mark

------------------------------
Mark Robinson
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robinson at garvan.org.au
e: mrobinson at wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852