[R] lm fails on some large input

peter dalgaard pd@|gd @end|ng |rom gm@||@com
Thu Apr 18 18:23:07 CEST 2019


Um, you need to reverse y and x there. The question was about lm(y ~ x)....

> X <- cbind(1, y)
> solve(crossprod(X))
Error in solve.default(crossprod(X)) : 
  system is computationally singular: reciprocal condition number = 6.19587e-35

Actually, lm can QR perfectly OK, but it gets caught by its singularity detection:

> qr <- qr(X, tol=1e-10)
> qr # without the tol bit, you get same thing but $rank == 1
$qr
                             y
 [1,] -3.0000000 -4.520117e+09
 [2,]  0.3333333 -3.426530e+01
 [3,]  0.3333333 -2.947103e-02
 [4,]  0.3333333  4.252164e-01
 [5,]  0.3333333 -3.665468e-01
 [6,]  0.3333333 -3.488029e-01
 [7,]  0.3333333  2.614064e-01
 [8,]  0.3333333  4.086982e-01
 [9,]  0.3333333  2.018556e-03

$rank
[1] 2

$qraux
[1] 1.333333 1.571779

$pivot
[1] 1 2

attr(,"class")
[1] "qr"
> x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001, 101.632, 108.928, 94.08)
> qr.coef(qr,x)
                          y 
-2.403345e+09  1.595099e+00 

> lm(x~y)

Call:
lm(formula = x ~ y)

Coefficients:
(Intercept)            y  
      94.73           NA  

> lm(x~y, tol=1e-10)

Call:
lm(formula = x ~ y, tol = 1e-10)

Coefficients:
(Intercept)            y  
 -2.403e+09    1.595e+00  

> lm(x~I(y-mean(y)))

Call:
lm(formula = x ~ I(y - mean(y)))

Coefficients:
   (Intercept)  I(y - mean(y))  
        94.734           1.595  


> On 18 Apr 2019, at 17:56 , Fox, John <jfox using mcmaster.ca> wrote:
> 
> Dear Michael and Dingyuan Wang,
> 
>> -----Original Message-----
>> From: R-help [mailto:r-help-bounces using r-project.org] On Behalf Of Michael
>> Dewey
>> Sent: Thursday, April 18, 2019 11:25 AM
>> To: Dingyuan Wang <gumblex using aosc.io>; r-help using r-project.org
>> Subject: Re: [R] lm fails on some large input
>> 
>> Perhaps subtract 1506705766 from y?
>> 
>> Saying some other software does it well implies you know what the _correct_
>> answer is here but I would question what that means with this sort of data-
>> set.
> 
> It's rather an interesting problem, though, because the naïve computation of the LS solution works:
> 
> plot(x, y)
> X <- cbind(1, x)
> b <- solve(t(X) %*% X) %*% t(X) %*% y
> b
> abline(b)
> 
> That surprised me, because I expected that lm() computation, using the QR decomposition, would be more numerically stable.
> 
> Best,
> John
> 
> -----------------------------------------------------------------
> John Fox
> Professor Emeritus
> McMaster University
> Hamilton, Ontario, Canada
> Web: https://socialsciences.mcmaster.ca/jfox/
> 
> 
> 
>> 
>> On 17/04/2019 07:26, Dingyuan Wang wrote:
>>> Hi,
>>> 
>>> This input doesn't have any interesting properties except y is unix
>>> time. Spreadsheets can do this well.
>>> Is this a bug that lm can't do x ~ y?
>>> 
>>> R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
>>> Copyright (C) 2018 The R Foundation for Statistical Computing
>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>> 
>>>> x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001,
>>> 101.632, 108.928, 94.08)  > y = c(1506705739.385, 1506705766.895,
>>> 1506705746.293, 1506705761.873, 1506705734.743, 1506705735.351,
>>> 1506705756.26, 1506705761.307,
>>> 1506705747.372)
>>>> m = lm(x ~ y)
>>>> summary(m)
>>> 
>>> Call:
>>> lm(formula = x ~ y)
>>> 
>>> Residuals:
>>>      Min       1Q   Median       3Q      Max
>>> -27.0222 -14.9902  -0.6542  14.1938  29.1698
>>> 
>>> Coefficients: (1 not defined because of singularities)
>>>             Estimate Std. Error t value Pr(>|t|)
>>> (Intercept)   94.734      6.511   14.55 4.88e-07 *** y
>>> NA         NA      NA       NA
>>> ---
>>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>> 
>>> Residual standard error: 19.53 on 8 degrees of freedom
>>> 
>>>> summary(lm(y ~ x))
>>> 
>>> Call:
>>> lm(formula = y ~ x)
>>> 
>>> Residuals:
>>>     Min      1Q  Median      3Q     Max
>>> -2.1687 -1.3345 -0.9466  1.3826  2.6551
>>> 
>>> Coefficients:
>>>              Estimate Std. Error   t value Pr(>|t|)
>>> (Intercept) 1.507e+09  3.294e+00 4.574e+08  < 2e-16 *** x
>>> 6.136e-01  3.413e-02 1.798e+01 4.07e-07 ***
>>> ---
>>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>>> 
>>> Residual standard error: 1.885 on 7 degrees of freedom Multiple
>>> R-squared:  0.9788,    Adjusted R-squared:  0.9758
>>> F-statistic: 323.3 on 1 and 7 DF,  p-value: 4.068e-07
>>> 
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>>> ---
>>> This email has been checked for viruses by AVG.
>>> https://www.avg.com
>>> 
>>> 
>> 
>> --
>> Michael
>> http://www.dewey.myzen.co.uk/home.html
>> 
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes using cbs.dk  Priv: PDalgd using gmail.com



More information about the R-help mailing list