[R] Problem with lm.resid() when weights are provided

Fox, John j|ox @end|ng |rom mcm@@ter@c@
Fri Sep 14 14:35:42 CEST 2018


Dear Hamed,

I don't think that anyone has picked up on this problem.

What's peculiar about your weights is that several are 0 within rounding error but not exactly 0:

> head(df)
           y          x       weight
1  1.5115614  0.5520924 2.117337e-34
2 -0.6365313 -0.1259932 2.117337e-34
3  0.3778278  0.4209538 4.934135e-31
4  3.0379232  1.4031545 2.679495e-24
5  1.5364652  0.4607686 2.679495e-24
6 -2.3772787 -0.7396358 6.244160e-21

I can reproduce the results that you report:

> (mod.1 <- lm(y ~ x, data=df))

Call:
lm(formula = y ~ x, data = df)

Coefficients:
(Intercept)            x  
   -0.04173      2.03790  

> max(resid(mod.1))
[1] 1.14046
> (mod.2 <- lm(y ~ x, data=df, weights=weight))

Call:
lm(formula = y ~ x, data = df, weights = weight)

Coefficients:
(Intercept)            x  
   -0.05786      1.96087  

> max(resid(mod.2))
[1] 36.84939

But the problem disappears when the tiny nonzero weight are set to 0:

> df2 <- df
> df2$weight <- zapsmall(df2$weight)
> head(df2)
           y          x weight
1  1.5115614  0.5520924      0
2 -0.6365313 -0.1259932      0
3  0.3778278  0.4209538      0
4  3.0379232  1.4031545      0
5  1.5364652  0.4607686      0
6 -2.3772787 -0.7396358      0
> (mod.3 <- update(mod.2, data=df2))

Call:
lm(formula = y ~ x, data = df2, weights = weight)

Coefficients:
(Intercept)            x  
   -0.05786      1.96087  

> max(resid(mod.3))
[1] 1.146663

I don't know exactly why this happens, but suspect numerical instability produced by the near-zero weights, which are smaller than the machine double-epsilon

> .Machine$double.neg.eps
[1] 1.110223e-16

The problem also disappears, e.g., if the tiny weight are set to 1e-15 rather than 0.

I hope this helps,
 John

-----------------------------------------------------------------
John Fox
Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
Web: https://socialsciences.mcmaster.ca/jfox/



> -----Original Message-----
> From: R-help [mailto:r-help-bounces using r-project.org] On Behalf Of Hamed Ha
> Sent: Tuesday, September 11, 2018 8:39 AM
> To: r-help using r-project.org
> Subject: [R] Problem with lm.resid() when weights are provided
> 
> Dear R Help Team.
> 
> I get some weird results when I use the lm function with weight. The issue can
> be reproduced by the example below:
> 
> 
> The input data is (weights are intentionally designed to reflect some
> structures in the data)
> 
> 
> > df
> y x weight
>  1.51156139  0.55209240 2.117337e-34
> -0.63653132 -0.12599316 2.117337e-34
>  0.37782776  0.42095384 4.934135e-31
>  3.03792318  1.40315446 2.679495e-24
>  1.53646523  0.46076858 2.679495e-24
> -2.37727874 -0.73963576 6.244160e-21
>  0.37183065  0.20407468 1.455107e-17
> -1.53917553 -0.95519361 1.455107e-17
>  1.10926675  0.03897129 3.390908e-14
> -0.37786333 -0.17523593 3.390908e-14
>  2.43973603  0.97970095 7.902000e-11
> -0.35432394 -0.03742559 7.902000e-11
>  2.19296613  1.00355263 4.289362e-04
>  0.49845532  0.34816207 4.289362e-04
>  1.25005260  0.76306225 5.000000e-01
>  0.84360691  0.45152356 5.000000e-01
>  0.29565993  0.53880068 5.000000e-01
> -0.54081334 -0.28104525 5.000000e-01
>  0.83612836 -0.12885659 9.995711e-01
> -1.42526769 -0.87107631 9.999998e-01
>  0.10204789 -0.11649899 1.000000e+00
>  1.14292898  0.37249631 1.000000e+00
> -3.02942081 -1.28966997 1.000000e+00
> -1.37549764 -0.74676145 1.000000e+00
> -2.00118016 -0.55182759 1.000000e+00
> -4.24441674 -1.94603608 1.000000e+00
>  1.17168144  1.00868008 1.000000e+00
>  2.64007761  1.26333069 1.000000e+00
>  1.98550114  1.18509599 1.000000e+00
> -0.58941683 -0.61972416 9.999998e-01
> -4.57559611 -2.30914920 9.995711e-01
> -0.82610544 -0.39347576 9.995711e-01
> -0.02768220  0.20076910 9.995711e-01
>  0.78186399  0.25690215 9.995711e-01
> -0.88314153 -0.20200148 5.000000e-01
> -4.17076452 -2.03547588 5.000000e-01
>  0.93373070  0.54190626 4.289362e-04
> -0.08517734  0.17692491 4.289362e-04
> -4.47546619 -2.14876688 4.289362e-04
> -1.65509103 -0.76898087 4.289362e-04
> -0.39403030 -0.12689705 4.289362e-04
>  0.01203300 -0.18689898 1.841442e-07
> -4.82762639 -2.31391121 1.841442e-07
> -0.72658380 -0.39751171 3.397282e-14
> -2.35886866 -1.01082109 0.000000e+00
> -2.03762707 -0.96439902 0.000000e+00
>  0.90115123  0.60172286 0.000000e+00
>  1.55999194  0.83433953 0.000000e+00
>  3.07994058  1.30942776 0.000000e+00
>  1.78871462  1.10605530 0.000000e+00
> 
> 
> 
> Running simple linear model returns:
> 
> > lm(y~x,data=df)
> 
> Call:
> lm(formula = y ~ x, data = df)
> 
> Coefficients:
> (Intercept)            x
>    -0.04173      2.03790
> 
> and
> > max(resid(lm(y~x,data=df)))
> [1] 1.14046
> 
> 
> *HOWEVER if I use the weighted model then:*
> 
> lm(formula = y ~ x, data = df, weights = df$weights)
> 
> Coefficients:
> (Intercept)            x
>    -0.05786      1.96087
> 
> and
> > max(resid(lm(y~x,data=df,weights=df$weights)))
> [1] 60.91888
> 
> 
> as you see, the estimation of the coefficients are nearly the same but the
> resid() function returns a giant residual (I have some cases where the value is
> much much higher). Further, if I calculate the residuals by simply
> predict(lm(y~x,data=df,weights=df$weights))-df$y then I get the true value for
> the residuals.
> 
> 
> Thanks.
> 
> Please do not hesitate to contact me for more details.
> Regards,
> Hamed.
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list