[R] Detection Times and Poisson Distribution

Wed Oct 28 09:33:00 CET 2009

On Tue, 27 Oct 2009 12:11:42 -0700 (PDT) Ben Bolker <bolker at ufl.edu> 
wrote:
> This is not quite right because we have estimated the
> rate from the data -- from ?ks.test
> 
...
> 
> But perhaps not a bad start.

Actually, it is a very bad start. Using estimated parameters in tests 
like ks.test gives you a *completely* wrong distribution of the test 
statistic and the resulting p-value. Here's a simple example:

library(MASS)
n=20
r=1

f=function(n,r)
{
  x=rexp(n,rate=r);
  ks.test(x,"pexp",rate=r)$p.value
}
g=function(n,r)
{
  x=rexp(n,rate=r);
  ks.test(x,"pexp",rate=1/mean(x))$p.value
}

truehist(replicate(1000, f(n,r)), h=.1, col="wheat")
truehist(replicate(1000, g(n,r)), h=.1, col="wheat")

Note that increasing the number of observations n does *not* help. Also 
note that under the null distribution, the parameter estimation mostly 
has an effect on the power; i.e., it *reduces* the probability of a type 
I error, and very much so. I'm not sure what the effect under the non-
null alternative is, but I know there have been written several papers 
on this topic.

-- 
Karl Ove Hufthammer