[R] inconsistent lm results with fixed response variable

Tue Jan 20 22:01:32 CET 2009

Oh for Pete's sake!

Computers use floating point arithmetic.  Your residual standard  
error in
case 2 (i.e. 1.44e-14) *is* 0, but floating point arithmetic can't  
quite see
that this is so. Put in a check for the RSE being 0, and ``over- 
ride'' the
adjusted R squared to be NA (or NaN, or whatever floats your boat) in  
such instances.
The all.equal() function might be useful to you:

 > x <- 1.44e-14
 > all.equal(x,0)
[1] TRUE

(Caution:  Trap for Young Players:  If x and y are ``really'' different,
then all.equal(x,y) doesn't return FALSE as you might expect, but rather
a description of the difference between x and y --- which may be  
complicated
if x and y are complicated objects.  The function isTRUE() is useful  
here.)

	cheers,

		Rolf Turner

On 21/01/2009, at 9:21 AM, tyler wrote:

> Hi,
>
> I'm analyzing a large number of simulations using lm(), a sample of  
> the
> resulting data is pasted below. In some simulations, the response
> variable doesn't vary, ie:
>
>> tmp[[2]]$richness
>  [1] 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40  
> 40 40 40 40
>
> When I analyze this using R version 2.8.0 (2008-10-20) on a linux
> cluster, I get an appropriate result:
>
>
> ## begin R ##
>
> summary(lm(richness ~ het, data = tmp[[2]]))
>
> Call:
> lm(formula = richness ~ het, data = tmp[[2]])
>
> Residuals:
>    Min     1Q Median     3Q    Max
>      0      0      0      0      0
>
> Coefficients:
>             Estimate Std. Error t value Pr(>|t|)
> (Intercept)       40          0     Inf   <2e-16 ***
> het                0          0      NA       NA
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 0 on 23 degrees of freedom
> Multiple R-squared:      ,      Adjusted R-squared:
> F-statistic:       on 1 and 23 DF,  p-value: NA
>
> ## end R ##
>
> This is good, as when I extract the Adjusted R-squared and slope I get
> NaN and 0, which are easily identified in my aggregate analysis, so I
> can deal with them appropriately.
>
> However, this isn't always the case:
>
> ## begin R ##
>
>  tmp[[1]]$richness
>  [1] 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40  
> 40 40 40 40
> [26] 40 40 40 40 40 40 40 40 40 40 40
>
>  summary(lm(richness ~ het, data = tmp[[1]]))
>
> Call:
> lm(formula = richness ~ het, data = tmp[[1]])
>
> Residuals:
>        Min         1Q     Median         3Q        Max
> -8.265e-14  1.689e-15  2.384e-15  2.946e-15  4.022e-15
>
> Coefficients:
>              Estimate Std. Error   t value Pr(>|t|)
> (Intercept) 4.000e+01  8.418e-15 4.752e+15   <2e-16 ***
> het         1.495e-14  4.723e-14 3.160e-01    0.754
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 1.44e-14 on 34 degrees of freedom
> Multiple R-squared: 0.5112,     Adjusted R-squared: 0.4968
> F-statistic: 35.56 on 1 and 34 DF,  p-value: 9.609e-07
>
> ## end R ##
>
> This is a problem, as when I plot the adj. R sq as part of an  
> aggregate
> analysis of a large number of simulations, it appears to be a very
> strong regression. I wouldn't have caught this except it was
> exceptionally high for the simulation parameters. It also differs by
> more than rounding error from the results with R 2.8.1 running on my
> laptop (Debian GNU/Linux), i.e., adj. R sq 0.5042 vs 0.4968.
> Furthermore, on my laptop, none of the analyses produce a NaN adj.  
> R sq,
> even for data that do produce that result on the cluster.
>
> Both my laptop and the linux cluster have na.action set to na.omit. Is
> there something else I can do to ensure that lm() returns slope == 0
> and adj.R.sq == NaN when the response variable is fixed?
>
> Thanks for any suggestions,
>
> Tyler
>
> Data follows:
>
> `tmp` <-
> list(structure(list(richness = c(40, 40,
> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
> 40, 40), range = c(0.655084651733024, 0.579667533660137,  
> 0.433092220907644,
> 0.62937198839679, 0.787891987978164, 0.623511540624239,  
> 0.542744487102066,
> 0.905937570175433, 0.806802881350753, 0.680413208666325,  
> 0.873426339019084,
> 0.699982832956593, 0.697716600618959, 0.952729864926405,  
> 0.782938474636578,
> 1.03899695305995, 0.715075858219333, 0.579749205792549,  
> 1.20648999819246,
> 0.648677938600964, 0.651883559714785, 0.997318331273967,  
> 0.926368116052012,
> 0.91001274146868, 1.20737951037620, 1.12006560586723,  
> 1.09806272133903,
> 0.9750792390176, 0.356496202035743, 0.612018080768747,  
> 0.701905693862144,
> 0.735857916053381, 0.991787489781244, 1.07247435214078,  
> 0.60061903319766,
> 0.699733090379818), het = c(0.154538307084452, 0.143186508136608,
> 0.0690948358402777, 0.132337152911839, 0.169037344105692,  
> 0.117783183361602,
> 0.117524251767612, 0.221161206774407, 0.204574928003633,  
> 0.170571000779693,
> 0.204489357007294, 0.131749663515638, 0.154127894997213,  
> 0.232672587431942,
> 0.198610891796736, 0.260497696582693, 0.129028191256682,  
> 0.128717975847452,
> 0.254300896783617, 0.113546727236817, 0.142220347446853,  
> 0.24828642688332,
> 0.194340945175726, 0.190782985783610, 0.214676796387244,  
> 0.252940213066992,
> 0.22362832797347, 0.182423482989676, 0.0602332226418674,  
> 0.145400861749859,
> 0.141297315445974, 0.139798699247632, 0.222815139716421,  
> 0.211971297234962,
> 0.120813579628747, 0.150590744533818), n.rich = c(40, 40, 40,
> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
> 40)), .Names = c("richness", "range", "het", "n.rich")),
>  structure(list(richness = c(40, 40,
> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
> 40, 40, 40, 40, 40, 40, 40), range = c(0.753203162648624,  
> 0.599708526308711,
> 0.714477274087683, 0.892359682406808, 0.868440625159371,  
> 0.753239521511417,
> 1.20164969658467, 1.20462111558583, 1.13142122690491,  
> 0.95241921975703,
> 1.13214481653550, 0.827528954009827, 1.14827745443481,  
> 0.936048043180592,
> 0.874649332193952, 1.38844778296649, 0.985016220913809,  
> 1.18166853164661,
> 0.784679773255876, 0.94894149080785, 0.770312904574722,  
> 1.10203660758219,
> 1.15624067277321, 0.692776967548628, 0.79343712876973),
> het = c(0.170481207967181,
> 0.108265674755723, 0.123316519598517, 0.220631611141464,  
> 0.160460967122565,
> 0.145032358811883, 0.293678286125082, 0.284769842125969,  
> 0.258637372765782,
> 0.18303781265474, 0.265304220319150, 0.194784967445680,  
> 0.248055723803990,
> 0.204658616507612, 0.167203828355069, 0.287030735881294,  
> 0.247639113771915,
> 0.269348295820692, 0.111409735752589, 0.209076579513581,  
> 0.176890183224181,
> 0.249378876987384, 0.260323833307383, 0.177061093736427,  
> 0.172263958005774
> ), n.rich = c(40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40)), .Names = c 
> ("richness",
> "range", "het", "n.rich")))
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

######################################################################
Attention: 
This e-mail message is privileged and confidential. If you are not the 
intended recipient please delete the message and notify the sender. 
Any views or opinions presented are solely those of the author.

This e-mail has been scanned and cleared by MailMarshal 
www.marshalsoftware.com
######################################################################