[R] second try; writing user-defined GLM link function

Mark Herzog mherzog at prbo.org
Mon Apr 17 18:16:15 CEST 2006


I was a little hesitant to post to everyone until I figured out why 
there is a discrepancy in the intercept estimates when compared to the 
same model run in SAS vs. R.  Everything else comes out correctly, 
including the other coefficient estimates... so perhaps it is just the 
numerical method used.  I think glm in R is using IWLS, and SAS is using ML.

If anyone has another idea feel free to let me know.  Here is my 
modified binomial family, now called logexposure.  Since there is no 
other choice for the link for this new logexposure "family", I have just 
embedded the make.link inside the logexposure family.

Good luck and I'd appreciate any comments.

logexposure<-function (link="logit",ExposureDays) {
     logexp<-make.link("logit")
     logexp$linkfun<-function(mu,expos=ExposureDays) {
         log((mu^(1/expos))/(1-mu^(1/expos)))
     }
     logexp$linkinv<-function(eta,expos=ExposureDays) {
         (exp(eta)/(1+exp(eta)))^expos
     }
     logexp$mu.eta<-function(eta,expos=ExposureDays) {
         (expos*exp(eta*expos)*(1+exp(eta))^(-expos-1))
     }
     variance <- function(mu) mu * (1 - mu)
     validmu <- function(mu) all(mu > 0) && all(mu < 1)
     dev.resids <- function(y, mu, wt) .Call("binomial_dev_resids",
         y, mu, wt, PACKAGE = "stats")
     aic <- function(y, n, mu, wt, dev) {
         m <- if (any(n > 1))
             n
         else wt
         -2 * sum(ifelse(m > 0, (wt/m), 0) * dbinom(round(m *
             y), round(m), mu, log = TRUE))
     }
     initialize <- expression({
         if (NCOL(y) == 1) {
             if (is.factor(y)) y <- y != levels(y)[1]
             n <- rep.int(1, nobs)
             if (any(y < 0 | y > 1)) stop("y values must be 0 <= y <= 1")
             mustart <- (weights * y + 0.5)/(weights + 1)
             m <- weights * y
             if (any(abs(m - round(m)) > 0.001))
                  warning("non-integer successes in a binomial glm!")
         } else if (NCOL(y) == 2) {
             if (any(abs(y - round(y)) > 0.001))
                 warning("non-integer counts in a binomial glm!")
             n <- y[, 1] + y[, 2]
             y <- ifelse(n == 0, 0, y[, 1]/n)
             weights <- weights * n
             mustart <- (n * y + 0.5)/(n + 1)
         } else stop("for the binomial family,",
                 " y must be a vector of 0 and 1's\n",
             "or a 2 column matrix where col 1 is",
             " no. successes and col 2 is no. failures")
     })
     structure(list(family="logexposure", link=logexp, 
linkfun=logexp$linkfun,
         linkinv=logexp$linkinv, variance=variance, dev.resids=dev.resids,
         aic=aic, mu.eta=logexp$mu.eta, initialize=initialize, 
validmu=validmu,
         valideta=logexp$valideta), class = "family")
}

#Example
nestdata<-read.table("http://www.branta.org/ShafferChatNestData/chat.txt")
chat.glm.logexp<-glm(survive/trials~parastat+nest_ht*patsize,
	family=logexposure(ExposureDays=nestdata$expos),data=nestdata)
# if you have MASS installed
library(MASS)
chat.step<-stepAIC(chat.glm.logexp,scope=list(upper=~parastat+nest_ht*patsize,lower=~1))
chat.step$anova
summary(chat.step)

#Not run:
# note this compares to following results from SAS :
#              Criteria For Assessing Goodness Of Fit
#
#    Criterion                 DF           Value        Value/DF
#    Deviance                 289        193.9987          0.6713
#    Scaled Deviance          289        193.9987          0.6713
#    Pearson Chi-Square       289        537.8609          1.8611
#    Scaled Pearson X2        289        537.8609          1.8611
#    Log Likelihood                      -96.9994
#    Algorithm converged.
#                       Analysis Of Parameter Estimates
#
#                                Standard Wald 95%   Chi-
#Parameter   DF  Estimate Error        Limits       Square Pr > ChiSq
##Intercept   1  2.6973   0.2769   2.1546  3.2399   94.92  <.0001
#parastat0    1 -1.0350   0.5201  -2.0544  -0.0155  3.96   0.0466
#parastat1    0  0.0000   0.0000   0.0000  0.0000     .      .
#patsizelarge 1  1.0844   0.5094   0.0861  2.0827   4.53   0.0333
#patsizesmall 0  0.0000   0.0000   0.0000  0.0000     .       .
#Scale        0  1.0000   0.0000   1.0000  1.0000
#
#NOTE: The scale parameter was held fixed.



Mark Herzog, Ph.D.
Program Leader, San Francisco Bay Research
Wetland Division, PRBO Conservation Science
4990 Shoreline Highway 1
Stinson Beach, CA 94970
(415) 893-7677 x308
mherzog at prbo.org

Jessi Brown wrote:
> I apologize for my earlier posting that, unbeknownst to me before, 
> apparently was not in the correct format for this list. Hopefully this
> attempt will go through, and no-one will hold the newbie mistake
> against me.
> 
> I could really use some help in writing a new glm link function in
> order to run an analysis of daily nest survival rates. I've struggled
> with this for weeks now, and can at least display the contents of the
> glm function, but I'm afraid I can't figure out even how to get
> started at modifiying the appropriate section (fairly new at R,
> complete beginner in writing functions in R!).
> 
> Essentially, all I will be doing is running a logistic regression, but
> with a different link function. The link function is a modification of
> the logit link:
> g(theta) = natural log( (theta ^(1/t)) / (1- (theta ^(1/t)) ) ;
> where t is the length of the interval between nest checks.
> 
> Could anyone help? I hope the answer is rather simple, since this just
> adds the exponent (1/t) to the logit link function; but I have yet to
> figure out how to do this.
> 
> Thanks in advance for any help.
> 
> cheers, Jessi Brown
> Program in Ecology, Evolution, and Conservation Biology
> University of Nevada-Reno
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 
> 
>




More information about the R-help mailing list