[R] inverse prediction and Poisson regression

Fri Jul 25 22:33:12 CEST 2003

Thanks Peter, for the wonderful illustration of various model-fitting 
options for the non-linear models. 

Thinking about your comment that the "variance stabilizing" transform 
does better than the weighted non-linear least-squars - I am 
interpreting this to mean that the residual sum of squares is smaller, 
0.16 versus 0.64.  A possible explanation may be that in the former 
case the responses are actually smaller because of the square-rooting 
(range about 3-5), so the residuals are smaller, whereas in the "gnls" 
case the response is the original variable (range about 10-30). Does 
this seem reasonable?

Ravi.

----- Original Message -----
From: Peter Dalgaard BSA <p.dalgaard at biostat.ku.dk>
Date: Friday, July 25, 2003 3:48 pm
Subject: Re: [R] inverse prediction and Poisson regression

> Spencer Graves <spencer.graves at pdf.com> writes:
> 
> > The problem with nls is that it is NOT maximum likelihood for the
> > Poisson distribution.  For the Poisson, the standard deviation 
> is the
> > square root of the mean, while nls assumes constant standard
> > deviation. That's why I stayed with glm.  The answers should be
> > comparable, but I would prefer the glm answers.  spencer graves
> 
> That's why I was suggesting either a variance stabilizing
> transformation or using gnls() with a weights function. In both cases
> you need to watch out for scaling of SE's stemming from the fact that
> the variance is supposed to be known in the Poisson distribution, but
> the fitting routines estimate a sigma from the residuals anyway. 
> 
> I.e. for instance:
> 
> > Phyto.nls2 <- nls(sqrt(y+.5) ~ sqrt(Ymax/(1 +
> > x/x50)+.5),data=Phytopath,start=list(Ymax=20.0,x50=0.01),trace=T)
> 9.400602 :  20.00  0.01
> 0.9064723 :  27.21511020  0.03593643
> 0.06338235 :  28.21654101  0.05995647
> 0.02616589 :  28.50221759  0.06815302
> 0.02608587 :  28.54871243  0.06835046
> 0.02608586 :  28.54960242  0.06834083
> 0.02608586 :  28.5495605  0.0683413
> > summary(Phyto.nls2)
> 
> Formula: sqrt(y + 0.5) ~ sqrt(Ymax/(1 + x/x50) + 0.5)
> 
> Parameters:
>     Estimate Std. Error t value Pr(>|t|)
> Ymax 28.54956    1.65113  17.291   0.0368 *
> x50   0.06834    0.01264   5.407   0.1164
> ---
> Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
> 
> Residual standard error: 0.1615 on 1 degrees of freedom
> 
> Correlation of Parameter Estimates:
>       Ymax
> x50 -0.6833
> 
> but here you need to know that the theoretical sd is 0.5, so the Std.
> errors all need multiplication by 0.5/0.1615.
> 
> Using gnls() we'd get (if I got the calling sequence right...)
> 
> > Phyto.gnls <- gnls(y ~ Ymax/(1 + x/x50),
> +  
> data=Phytopath,weights=varPower(fixed=.5),start=list
(Ymax=20.0,x50=0.01))> summary(Phyto.gnls)
> Generalized nonlinear least squares fit
>  Model: y ~ Ymax/(1 + x/x50)
>  Data: Phytopath
>       AIC      BIC    logLik
>  13.71292 11.00875 -3.856458
> 
> Variance function:
> Structure: Power of variance covariate
> Formula: ~fitted(.)
> Parameter estimates:
> power
>  0.5
> 
> Coefficients:
>         Value Std.Error   t-value p-value
> Ymax 29.393796 2.4828723 11.838626  0.0536
> x50   0.063028 0.0125244  5.032376  0.1249
> 
> Correlation:
>    Ymax
> x50 -0.84
> 
> Standardized residuals:
> [1] -0.4894266  0.7621749 -0.4237346
> attr(,"std")
> [1] 2.8478142 1.4239071 0.8586483
> attr(,"label")
> [1] "Standardized residuals"
> 
> Residual standard error: 0.6367906
> Degrees of freedom: 3 total; 1 residual
> 
> and again, it is necessary to adjust the SE's by multiplying with
> 1/0.6368
> 
> It is in fact rather easy to do direct MLE for this kind of model:
> 
> > with(Phytopath,
> > optim(c(28,.7),fn=function(p){Ymax<-p[1];x50<-p[2];
> + -sum(dpois(y,lambda=Ymax/(1
> + + x/x50),log=TRUE))}, method="BFGS", hessian=T))
> $par
> [1] 28.55963487  0.06833524
> 
> $value
> [1] 7.21251
> 
> $counts
> function gradient
>      42       13
> 
> $convergence
> [1] 0
> 
> $message
> NULL
> 
> $hessian
>           [,1]        [,2]
> [1,] 0.07356072    6.631868
> [2,] 6.63186764 1294.792539
> 
> Warning message:
> NaNs produced in: dpois(x, lambda, log)
> 
> and we can get SE's from the inverse hessian with
> 
> > sqrt(diag(solve(with(Phytopath,
> + optim(c(28,.7),fn=function(p){Ymax<-p[1];x50<-p[2];
> + -sum(dpois(y,lambda=Ymax/(1
> + + x/x50),log=TRUE))}, method="BFGS", hessian=T))$hessian)))
> [1] 5.02565893 0.03788052
> 
> Notice that the variance stabilizing transform seems to do rather
> better than gnls() compared to true MLE. I'm slightly puzzled by this,
> but presumably it has to do with the fact that MLE for a Gaussian
> model with a mean/variance relationship is *not* identical to the
> iteratively reweighted least squares that glm() et al. are doing. (I
> wouldn't quite rule out that I've simply made a mistake somewhere,
> though...)
> 
> -- 
>   O__  ---- Peter Dalgaard             Blegdamsvej 3  
>  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
> (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 
> 35327918~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: 
> (+45) 35327907
>