[R] Python and R

Douglas Bates bates at stat.wisc.edu
Sat Feb 21 18:15:55 CET 2009


Different methods of performing least squares calculations in R are discussed in

@Article{Rnews:Bates:2004,
  author       = {Douglas Bates},
  title        = {Least Squares Calculations in {R}},
  journal      = {R News},
  year         = 2004,
  volume       = 4,
  number       = 1,
  pages        = {17--20},
  month        = {June},
  url          = http,
  pdf          = Rnews2004-1
}

Some of the functions mentioned in that article have been modified.  A
more up-to-date version of the comparisons in that article is
available as the Comparisons vignette in the Matrix package.

On Fri, Feb 20, 2009 at 6:06 AM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> Note that using solve can be numerically unstable for certain problems.
>
> On Fri, Feb 20, 2009 at 6:50 AM, Kenn Konstabel <lebatsnok at gmail.com> wrote:
>> Decyphering formulas seems to be the most time consuming part of lm:
>>
>> mylm1 <- function(formula, data) {
>>     # not perfect but works
>>     F <- model.frame(formula,data)
>>     y <- model.response(F)
>>     mt <- attr(F, "terms")
>>     x <- model.matrix(mt,F)
>>     coefs <- solve(crossprod(x), crossprod(x,y))
>>     coefs
>>     }
>>
>> mylm2 <- function(x, y, intercept=TRUE) {
>>     if(!is.matrix(x)) x <- as.matrix(x)
>>     if(intercept) x <- cbind(1,x)
>>     if(!is.matrix(y)) y <- as.matrix(y)
>>     solve(crossprod(x), crossprod(x,y))
>>     }
>>
>>> system.time(for(i in 1:1000) mylm2(EuStockMarkets[,-1],
>>> EuStockMarkets[,"DAX"]))
>>    user  system elapsed
>>    6.43    0.00    6.53
>>> system.time(for(i in 1:1000) mylm1(DAX~., EuStockMarkets))
>>    user  system elapsed
>>   16.19    0.00   16.23
>>> system.time(for(i in 1:1000) lm(DAX~., EuStockMarkets))
>>    user  system elapsed
>>   21.43    0.00   21.44
>>
>> So if you need to save time, I'd suggest something close to mylm2 rather
>> than mylm1.
>>
>> Kenn
>>
>>
>>
>> On Thu, Feb 19, 2009 at 8:04 PM, Gabor Grothendieck
>> <ggrothendieck at gmail.com> wrote:
>>>
>>> On Thu, Feb 19, 2009 at 8:30 AM, Esmail Bonakdarian <esmail.js at gmail.com>
>>> wrote:
>>>
>>> > Thanks for the suggestions, I'll have to see if I can figure out how to
>>> > convert the relatively simple call to lm with an equation and the data
>>> > file
>>> > to the functions you mention (or if that's even feasible).
>>>
>>> X <- model.matrix(formula, data)
>>>
>>> will calculate the X matrix for you.
>>>
>>> >
>>> > Not an expert in statistics myself, I am mostly concentrating on the
>>> > programming aspects of R. Problem is that I suspect my colleagues who
>>> > are providing some guidance with the stats end are not quite experts
>>> > themselves, and certainly new to R.
>>> >
>>> > Cheers,
>>> >
>>> > Esmail
>>> >
>>> > Kenn Konstabel wrote:
>>> >>
>>> >> lm does lots of computations, some of which you may never need. If
>>> >> speed
>>> >> really matters, you might want to compute only those things you will
>>> >> really
>>> >> use. If you only need coefficients, then using %*%, solve and crossprod
>>> >> will
>>> >> be remarkably faster than lm
>>> >>
>>> >> # repeating someone else's example
>>> >> # lm(DAX~., EuStockMarkets)
>>> >>
>>> >>  y <- EuStockMarkets[,"DAX"]
>>> >>  x <- EuStockMarkets
>>> >>  x[,1]<-1
>>> >> colnames(x)[1] <- "Intercept"
>>> >>
>>> >> lm(y ~ x-1)
>>> >> solve(crossprod(x), t(x))%*%y    # probably this can be done more
>>> >> efficiently
>>> >>
>>> >> # and a naive timing
>>> >>
>>> >>  > system.time( for(i in 1:1000) lm(y ~ x-1))
>>> >>   user  system elapsed
>>> >>  14.64    0.33   32.69
>>> >>  > system.time(for(i in 1:1000) solve(crossprod(x), crossprod(x,y)) )
>>> >>   user  system elapsed
>>> >>   0.36    0.00    0.36
>>> >>
>>> >>
>>> >> Also lsfit() is a bit quicker than lm or lm.fit.
>>> >>
>>> >> Regards,
>>> >> Kenn
>>> >
>>> > ______________________________________________
>>> > R-help at r-project.org mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide
>>> > http://www.R-project.org/posting-guide.html
>>> > and provide commented, minimal, self-contained, reproducible code.
>>> >
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list