[R] discrepancy between R & Splus lm.influence() for family=Gamma

Fri Sep 12 09:16:49 CEST 2003

>>>>> "Andrew" == Andrew Hill <AHill at wyeth.com>
>>>>>     on Thu, 11 Sep 2003 17:16:30 -0400 writes:

    Andrew> Hello, I am looking for an explanation and/or fix
    Andrew> for a discrepancy in the behaviour of the R
    Andrew> lm.influence() function [ version R 1.5.0
    Andrew> (2002-04-29) ] and the same function in Splus [
    Andrew> Splus version 5.1 release 1, running on SGI IRIX
    Andrew> 6.2].  The discrepancy is of concern because I am
    Andrew> migrating some Splus scripts to R and need to ensure
    Andrew> consistency of results.

Before reading on: 
Do you really mean R 1.5.0?  
If yes, you should definitely upgrade to R 1.7.1 !

There were considerable improvements (for R 1.7.0) for these
functions, mostly thanks to John Fox, and the recommended way in
R is to use influence() which is a generic function that has both an "lm"
and (important for you!) a "glm" method.

I.e., you use  influence(mylmfit, ...) and the method
 influence.glm(mylmfit, ...) will be called.  
This should do the correct calculations for all kind of glm models.

Regards,
Martin Maechler <maechler at stat.math.ethz.ch>	http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16	Leonhardstr. 27
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1228			<><

    Andrew> Specifically, when I fit a glm() model to a test
    Andrew> dataset using the family = Gamma(link=identity), and
    Andrew> then call lm.influence on the fitted glm object, the
    Andrew> resulting lm.influence()$coefficients and
    Andrew> lm.influence()$sigma values are different between R
    Andrew> and Splus versions.  The lm.influence()$hat vector
    Andrew> does agree between the two programs.  Also, the
    Andrew> glm() function does return the same model
    Andrew> coefficient in both R and Splus.

    Andrew> In contrast, if I use the default glm
    Andrew> family=Gaussian(link=identity), all output of
    Andrew> lm.influence() for both R and Splus does agree fully
    Andrew> for my dataset.

    Andrew> I have read the R help function for lm.influence()
    Andrew> and I understand that R returns the difference
    Andrew> between the model coefficients and the drop-one
    Andrew> coefficients, while Splus returns the drop-one
    Andrew> coefficients.  But this does not account for the
    Andrew> discrepancy that I see in the
    Andrew> lm.influence$coefficients, nor the difference in
    Andrew> lm.influence$sigma, at least to my understanding.

    Andrew> Pasted below is output, first from R, and second
    Andrew> from Splus, which illustates the issue.
    Andrew> Discrepancies between the R and Splus $sigma values
    Andrew> look like ~ 2-6%.  Hopefully I have not overlooked
    Andrew> an obvious statistical explanation for the
    Andrew> difference.

    Andrew> Thanks, Andrew.