[R] How can I get "predict.lm" results with manual calculations ? (a floating point problem)

David Winsemius dwinsemius at comcast.net
Sun Sep 13 19:34:41 CEST 2009


On Sep 13, 2009, at 1:19 PM, Tal Galili wrote:

> Hello dear r-help group
>
> I am turning for you for help with FAQ number 7.31: "Why doesn't R  
> think
> these numbers are equal?"
> http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f
>
>
> *My story* is this:
> I wish to run many lm predictions and need to have them run fast.
> Using predict.lm is relatively slow, so I tried having it run faster  
> by
> doing the prediction calculations manually.
> But doing that gave me problematic results (I won't go into the  
> details of
> how I found that out).
>
> I then discovered that the problem was that the manual calculations  
> I used
> for the lm predictions yielded different results than that of  
> predict.lm,
>
> *here is an example*:
>
> predict.lm.diff.from.manual.compute <- function(sample.size = 100)
> {
> x <- rnorm(sample.size)
> y <- x + rnorm(sample.size)
> new <- data.frame(x = seq(-3, 3, length.out = sample.size))
> aa <- lm(y ~ x)
>
> predict.lm.result <- sum(predict(aa, new, se.fit = F))
> manual.lm.compute.result <- sum(aa$coeff[1]+ new * aa$coeff[2])
>
> # manual.lm.compute.result == predict.lm.result
> return(all.equal(manual.lm.compute.result , predict.lm.result, tol=0))
> }
>
> # and here are the results of running the code several times:
>
>> predict.lm.diff.from.manual.compute(100)
> [1] "Mean relative difference: 1.046407e-15"
>> predict.lm.diff.from.manual.compute(1000)
> [1] "Mean relative difference: 4.113951e-16"
>> predict.lm.diff.from.manual.compute(10000)
> [1] "Mean relative difference: 2.047455e-14"
>> predict.lm.diff.from.manual.compute(100000)
> [1] "Mean relative difference: 1.294251e-14"
>> predict.lm.diff.from.manual.compute(1000000)
> [1] "Mean relative difference: 5.508314e-13"

Are these meaningfully different? You are using different datasets and  
didn't even set a seed for your RNG. They seem really quite close. Why  
should we care?

 > .Machine$double.eps
[1] 2.220446e-16

>>
> And that leaves me with *the question*:
> Can I reproduce more accurate results from the manual calculations  
> (as the
> ones I might have gotten from predict.lm) ?
> Maybe some parameter to increase the precision of the computation ?
>
> Many thanks,
> Tal

-- 
David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list