[R] lm fails on some large input

Jeff Newmiller jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@
Thu Apr 18 20:44:03 CEST 2019


I make a general rule not to stick time values into numerical analysis algorithms without first subtracting a reasonable epoch (to obtain difftime) and then using as.numeric.POSIXt with the units argument set explicitly so the analysis uses numeric values that I can interpret. While the explicit use of difftime function does something similar, if any other operations are performed on it the units could change again before the inevitable conversion to numeric occurs somewhere down the line so I think taking responsibility for the numeric conversion myself is less likely to leave surprises.

On April 18, 2019 9:32:09 AM PDT, William Dunlap via R-help <r-help using r-project.org> wrote:
>This sort of data arises quite easily if you deal with time/dates
>around
>now.  E.g.,
>
>> d <- data.frame(
>+     when = seq(as.POSIXct("2017-09-29 18:22:01"), by="secs", len=10),
>+     measurement = log2(1:10))
>> coef(lm(data=d, measurement ~ when))
>       (Intercept)               when
>2.1791061114716954                 NA
>> as.numeric(d$when)[1:2]
>[1] 1506734521 1506734522
>
>There are problems with the time units (seconds vs. hours) if you
>subtract
>off a time because the units of -.POSIXt depend on the data:
>
>> coef(lm(data=d, measurement ~ I(when - min(when))))
>        (Intercept) I(when - min(when))
>0.68327571513124297 0.33240675474232279
>> coef(lm(data=d, measurement ~ I(when - as.POSIXct("2017-09-29
>00:00:00"))))
>                            (Intercept) I(when - as.POSIXct("2017-09-29
>00:00:00"))
>                       -21978.3837546251634
>1196.6643170736229
>
>
>Hence you have to use difftime and specify the units
>
>> coef(lm(data=d, measurement ~ difftime(when, as.POSIXct("2017-09-29
>00:00:00"), units="secs")))
>                                                      (Intercept)
>                                          -2.1978383754612696e+04
>difftime(when, as.POSIXct("2017-09-29 00:00:00"), units = "secs")
>                                           3.3240675474248449e-01
>> coef(lm(data=d, measurement ~ difftime(when, min(when),
>units="secs")))
>                          (Intercept) difftime(when, min(when), units =
>"secs")
>                      0.68327571513124297
> 0.33240675474232279
>
>
>
>Bill Dunlap
>TIBCO Software
>wdunlap tibco.com
>
>
>On Thu, Apr 18, 2019 at 8:24 AM Michael Dewey <lists using dewey.myzen.co.uk>
>wrote:
>
>> Perhaps subtract 1506705766 from y?
>>
>> Saying some other software does it well implies you know what the
>> _correct_ answer is here but I would question what that means with
>this
>> sort of data-set.
>>
>> On 17/04/2019 07:26, Dingyuan Wang wrote:
>> > Hi,
>> >
>> > This input doesn't have any interesting properties except y is unix
>> > time. Spreadsheets can do this well.
>> > Is this a bug that lm can't do x ~ y?
>> >
>> > R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
>> > Copyright (C) 2018 The R Foundation for Statistical Computing
>> > Platform: x86_64-pc-linux-gnu (64-bit)
>> >
>> >  > x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001,
>> > 101.632, 108.928, 94.08)
>> >  > y = c(1506705739.385, 1506705766.895, 1506705746.293,
>1506705761.873,
>> > 1506705734.743, 1506705735.351, 1506705756.26, 1506705761.307,
>> > 1506705747.372)
>> >  > m = lm(x ~ y)
>> >  > summary(m)
>> >
>> > Call:
>> > lm(formula = x ~ y)
>> >
>> > Residuals:
>> >       Min       1Q   Median       3Q      Max
>> > -27.0222 -14.9902  -0.6542  14.1938  29.1698
>> >
>> > Coefficients: (1 not defined because of singularities)
>> >              Estimate Std. Error t value Pr(>|t|)
>> > (Intercept)   94.734      6.511   14.55 4.88e-07 ***
>> > y                 NA         NA      NA       NA
>> > ---
>> > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>> >
>> > Residual standard error: 19.53 on 8 degrees of freedom
>> >
>> >  > summary(lm(y ~ x))
>> >
>> > Call:
>> > lm(formula = y ~ x)
>> >
>> > Residuals:
>> >      Min      1Q  Median      3Q     Max
>> > -2.1687 -1.3345 -0.9466  1.3826  2.6551
>> >
>> > Coefficients:
>> >               Estimate Std. Error   t value Pr(>|t|)
>> > (Intercept) 1.507e+09  3.294e+00 4.574e+08  < 2e-16 ***
>> > x           6.136e-01  3.413e-02 1.798e+01 4.07e-07 ***
>> > ---
>> > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>> >
>> > Residual standard error: 1.885 on 7 degrees of freedom
>> > Multiple R-squared:  0.9788,    Adjusted R-squared:  0.9758
>> > F-statistic: 323.3 on 1 and 7 DF,  p-value: 4.068e-07
>> >
>> > ______________________________________________
>> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>> > ---
>> > This email has been checked for viruses by AVG.
>> > https://www.avg.com
>> >
>> >
>>
>> --
>> Michael
>> http://www.dewey.myzen.co.uk/home.html
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.



More information about the R-help mailing list