[R] second try; writing user-defined GLM link function

Tue Apr 18 01:23:41 CEST 2006

On Mon, 17 Apr 2006, John Fox wrote:

> Dear Mark,
>
> IWLS is a method for obtaining ML estimates in GLMs.
>
> The dummy variables for representing factors (in R terminology, CLASS
> variables in SAS) are treated differently. SAS uses a full-rank
> parametrization of the model matrix, but sets the coefficient for one
> category in a factor to 0, so the effect is equivalent to dummy-coding --
> that is the default "treatment" contrast coding in R. Note, however, that in
> the SAS output the coefficient for parastat == 0 is estimated and that for
> parastat == 1 is set to 0; this is the opposite of what happens by default
> in R. If you look more closely at the output, you'll see that the
> coefficient for parastat in R is of the same magnitude as in SAS but of the
> opposite sign. The same is true for the factor patsize. Note that the
> standard errors in the two outputs differ because in R you're estimating the
> dispersion parameter while in SAS it's fixed to 1.
>
> The upshot is that (except for the dispersion parameter) you have equivalent
> but not identical parametrizations of the model.

You can (and probably should) set dispersion=1 when calling summary. That 
would happen automatically if you set the family name (not the function 
name but the "family" component) to "binomial" (which is what it is, 
albeit with an additional link).

>
> I hope this helps,
> John
>
> --------------------------------
> John Fox
> Department of Sociology
> McMaster University
> Hamilton, Ontario
> Canada L8S 4M4
> 905-525-9140x23604
> http://socserv.mcmaster.ca/jfox
> --------------------------------
>
>> -----Original Message-----
>> From: r-help-bounces at stat.math.ethz.ch
>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Mark Herzog
>> Sent: Monday, April 17, 2006 11:16 AM
>> To: Jessi Brown
>> Cc: r-help at stat.math.ethz.ch
>> Subject: Re: [R] second try; writing user-defined GLM link function
>>
>> I was a little hesitant to post to everyone until I figured
>> out why there is a discrepancy in the intercept estimates
>> when compared to the same model run in SAS vs. R.  Everything
>> else comes out correctly, including the other coefficient
>> estimates... so perhaps it is just the numerical method used.
>>  I think glm in R is using IWLS, and SAS is using ML.
>>
>> If anyone has another idea feel free to let me know.  Here is
>> my modified binomial family, now called logexposure.  Since
>> there is no other choice for the link for this new
>> logexposure "family", I have just embedded the make.link
>> inside the logexposure family.
>>
>> Good luck and I'd appreciate any comments.
>>
>> logexposure<-function (link="logit",ExposureDays) {
>>      logexp<-make.link("logit")
>>      logexp$linkfun<-function(mu,expos=ExposureDays) {
>>          log((mu^(1/expos))/(1-mu^(1/expos)))
>>      }
>>      logexp$linkinv<-function(eta,expos=ExposureDays) {
>>          (exp(eta)/(1+exp(eta)))^expos
>>      }
>>      logexp$mu.eta<-function(eta,expos=ExposureDays) {
>>          (expos*exp(eta*expos)*(1+exp(eta))^(-expos-1))
>>      }
>>      variance <- function(mu) mu * (1 - mu)
>>      validmu <- function(mu) all(mu > 0) && all(mu < 1)
>>      dev.resids <- function(y, mu, wt) .Call("binomial_dev_resids",
>>          y, mu, wt, PACKAGE = "stats")
>>      aic <- function(y, n, mu, wt, dev) {
>>          m <- if (any(n > 1))
>>              n
>>          else wt
>>          -2 * sum(ifelse(m > 0, (wt/m), 0) * dbinom(round(m *
>>              y), round(m), mu, log = TRUE))
>>      }
>>      initialize <- expression({
>>          if (NCOL(y) == 1) {
>>              if (is.factor(y)) y <- y != levels(y)[1]
>>              n <- rep.int(1, nobs)
>>              if (any(y < 0 | y > 1)) stop("y values must be 0
>> <= y <= 1")
>>              mustart <- (weights * y + 0.5)/(weights + 1)
>>              m <- weights * y
>>              if (any(abs(m - round(m)) > 0.001))
>>                   warning("non-integer successes in a binomial glm!")
>>          } else if (NCOL(y) == 2) {
>>              if (any(abs(y - round(y)) > 0.001))
>>                  warning("non-integer counts in a binomial glm!")
>>              n <- y[, 1] + y[, 2]
>>              y <- ifelse(n == 0, 0, y[, 1]/n)
>>              weights <- weights * n
>>              mustart <- (n * y + 0.5)/(n + 1)
>>          } else stop("for the binomial family,",
>>                  " y must be a vector of 0 and 1's\n",
>>              "or a 2 column matrix where col 1 is",
>>              " no. successes and col 2 is no. failures")
>>      })
>>      structure(list(family="logexposure", link=logexp,
>> linkfun=logexp$linkfun,
>>          linkinv=logexp$linkinv, variance=variance,
>> dev.resids=dev.resids,
>>          aic=aic, mu.eta=logexp$mu.eta,
>> initialize=initialize, validmu=validmu,
>>          valideta=logexp$valideta), class = "family") }
>>
>> #Example
>> nestdata<-read.table("http://www.branta.org/ShafferChatNestDat
>> a/chat.txt")
>> chat.glm.logexp<-glm(survive/trials~parastat+nest_ht*patsize,
>> 	family=logexposure(ExposureDays=nestdata$expos),data=nestdata)
>> # if you have MASS installed
>> library(MASS)
>> chat.step<-stepAIC(chat.glm.logexp,scope=list(upper=~parastat+
>> nest_ht*patsize,lower=~1))
>> chat.step$anova
>> summary(chat.step)
>>
>> #Not run:
>> # note this compares to following results from SAS :
>> #              Criteria For Assessing Goodness Of Fit
>> #
>> #    Criterion                 DF           Value        Value/DF
>> #    Deviance                 289        193.9987          0.6713
>> #    Scaled Deviance          289        193.9987          0.6713
>> #    Pearson Chi-Square       289        537.8609          1.8611
>> #    Scaled Pearson X2        289        537.8609          1.8611
>> #    Log Likelihood                      -96.9994
>> #    Algorithm converged.
>> #                       Analysis Of Parameter Estimates
>> #
>> #                                Standard Wald 95%   Chi-
>> #Parameter   DF  Estimate Error        Limits       Square Pr > ChiSq
>> ##Intercept   1  2.6973   0.2769   2.1546  3.2399   94.92  <.0001
>> #parastat0    1 -1.0350   0.5201  -2.0544  -0.0155  3.96   0.0466
>> #parastat1    0  0.0000   0.0000   0.0000  0.0000     .      .
>> #patsizelarge 1  1.0844   0.5094   0.0861  2.0827   4.53   0.0333
>> #patsizesmall 0  0.0000   0.0000   0.0000  0.0000     .       .
>> #Scale        0  1.0000   0.0000   1.0000  1.0000
>> #
>> #NOTE: The scale parameter was held fixed.
>>
>>
>>
>> Mark Herzog, Ph.D.
>> Program Leader, San Francisco Bay Research Wetland Division,
>> PRBO Conservation Science 4990 Shoreline Highway 1 Stinson
>> Beach, CA 94970
>> (415) 893-7677 x308
>> mherzog at prbo.org
>>
>> Jessi Brown wrote:
>>> I apologize for my earlier posting that, unbeknownst to me before,
>>> apparently was not in the correct format for this list.
>> Hopefully this
>>> attempt will go through, and no-one will hold the newbie mistake
>>> against me.
>>>
>>> I could really use some help in writing a new glm link function in
>>> order to run an analysis of daily nest survival rates. I've
>> struggled
>>> with this for weeks now, and can at least display the
>> contents of the
>>> glm function, but I'm afraid I can't figure out even how to get
>>> started at modifiying the appropriate section (fairly new at R,
>>> complete beginner in writing functions in R!).
>>>
>>> Essentially, all I will be doing is running a logistic
>> regression, but
>>> with a different link function. The link function is a
>> modification of
>>> the logit link:
>>> g(theta) = natural log( (theta ^(1/t)) / (1- (theta ^(1/t))
>> ) ; where
>>> t is the length of the interval between nest checks.
>>>
>>> Could anyone help? I hope the answer is rather simple,
>> since this just
>>> adds the exponent (1/t) to the logit link function; but I
>> have yet to
>>> figure out how to do this.
>>>
>>> Thanks in advance for any help.
>>>
>>> cheers, Jessi Brown
>>> Program in Ecology, Evolution, and Conservation Biology
>> University of
>>> Nevada-Reno
>>>
>>> ______________________________________________
>>> R-help at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide!
>>> http://www.R-project.org/posting-guide.html
>>>
>>>
>>>
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide!
>> http://www.R-project.org/posting-guide.html
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595