[Rd] standardized residuals (rstandard & plot.lm) (PR#8468)

Heather.Turner@warwick.ac.uk Heather.Turner at warwick.ac.uk
Tue Jan 10 14:30:23 CET 2006


This bug is not quite fixed - the example from my original report now =
works using R-2.2.1, but

plot(Uniform, 6)

does not. The bug is due to
 if (show[6]) {
        ymx <- max(cook, na.rm =3D TRUE) * 1.025
        g <- hatval/(1 - hatval) # Potential division by zero here #
        plot(g, cook, xlim =3D c(0, max(g)), ylim =3D c(0, ymx),=20
            main =3D main, xlab =3D "Leverage", ylab =3D "Cook's distance",=
=20
            xaxt =3D "n", type =3D "n", ...)
...

All other values of 'which' seem to work fine. Sorry not to have checked =
this in the beta version,

Heather

>>> Prof Brian Ripley <ripley at stats.ox.ac.uk> 12/06/05 04:10pm >>>
Curiously, I was just looking at that, since I believe the answer =
should=20
be NaN, and some optimizing compilers/fast BLASes are not giving that.
(There's an example in reg-test-3.R.)  So I think we need to return NaN=20
when hat is within rounding error of 1.

My take is that plot.lm should handle this: you will see most but not =
all=20
cases have na.rm=3DTRUE in calculating ylim, but as Inf is theoretically
impossible it has not been considered.

Note that plot.lm does not use rstandard and so needs a separate fix.

Thanks for the report

On Tue, 6 Dec 2005 Heather.Turner at warwick.ac.uk wrote:

> Full_Name: Heather Turner
> Version: 2.2.0
> OS: Windows XP
> Submission from: (NULL) (137.205.240.44)
>
>
> Standardized residuals as calculated by rstandard.lm, rstandard.glm and =
plot.lm
> are Inf/NaN rather than zero when the un-standardized residuals are =
zero. This
> causes plot.lm to break when calculating 'ylim' for any of the plots of
> standardized residuals. Example:
>
> "occupationalStatus" <-
>    structure(as.integer(c(50, 16, 12, 11, 2, 12, 0, 0, 19, 40, 35,
>                           20, 8, 28, 6, 3, 26, 34, 65, 58, 12, 102, 19, =
14, 8,
>                           18, 66, 110, 23, 162, 40, 32, 7, 11, 35, 40, =
25, 90,
>                           21, 15, 11, 20, 88, 183, 46, 554, 158, 126, 6, =
8,
> 23,
>                           64, 28, 230, 143, 91, 2, 3, 21, 32, 12, 177, =
71,
> 106)
>                         ), .Dim =3D as.integer(c(8, 8)), .Dimnames =3D
>              structure(list(origin =3D c("1", "2", "3", "4", "5", "6", =
"7",
> "8"),
>                             destination =3D c("1", "2", "3", "4", "5", =
"6", "7",
>                             "8")), .Names =3D c("origin", "destination"))=
,
>              class =3D "table")
> Diag <- as.factor(diag(1:8))
> Rscore <- scale(as.numeric(row(occupationalStatus)), scale =3D FALSE)
> Cscore <- scale(as.numeric(col(occupationalStatus)), scale =3D FALSE)
> Uniform <- glm(Freq ~ origin + destination + Diag +
>               Rscore:Cscore, family =3D poisson, data =3D occupationalSta=
tus)
> residuals(Uniform)[as.logical(diag(8))] #zero/near-zero
> rstandard(Uniform)[as.logical(diag(8))] #mostly Inf/NaN
> plot(Uniform) #breaks on qqnorm plot (or any 'which' > 1)
>
> This could be fixed by replacing standardized residuals with zero where =
the hat
> value is one, e.g.
> rstandard.glm <- function (model,
>                            infl =3D lm.influence(model, do.coef =3D =
FALSE),
>                            ...) {
>     res <- infl$wt.res
>     hat <- infl$hat
>     ifelse(hat =3D=3D 1, 0, res/sqrt(summary(model)$dispersion * (1 -
> infl$hat)))
> }
> etc.
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel=20
>
>

--=20
Brian D. Ripley,                  ripley at stats.ox.ac.uk=20
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/=20
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list