[R] Python and R

Esmail Bonakdarian esmail.js at gmail.com
Thu Feb 19 14:24:28 CET 2009


Gabor Grothendieck wrote:
> On Wed, Feb 18, 2009 at 7:27 AM, Esmail Bonakdarian <esmail.js at gmail.com> wrote:
>> Gabor Grothendieck wrote:
>>>
>>> See ?Rprof for profiling your R code.
>>>
>>> If lm is the culprit, rewriting your lm calls using lm.fit might help.
>> Yes, based on my informal benchmarking, lm is the main "bottleneck", the
>> rest
>> of the code consists mostly of vector manipulations and control structures.
>>
>> I am not familiar with lm.fit, I'll definitely look it up. I hope it's
>> similar
>> enough to make it easy to substitute one for the other.
>>
>> Thanks for the suggestion, much appreciated. (My runs now take sometimes
>> several hours, it would be great to cut that time down by any amount :-)
>>
> 
> Yes, the speedup can be significant.  e.g. here we cut the time down to
> 40% of the lm time by using lm.fit and we can get down to nearly 10% if
> we go even lower level:

Wow those numbers look impressive, that would be a nice speedup to have.

I took a look at the manual and found the following at the top of
the description for lm.fit:

   "These are the basic computing engines called by lm used to fit linear
    models. These should usually not be used directly unless by experienced
    users. "

I am certainly not an experienced user - so I wonder how different it
would be to use lm.fit instead of lm.

Right now I cobble together an equation and then call lm with it and the
datafile.

I.e.,

     LM.1 = lm(as.formula(eqn), data=datafile)
     s=summary(LM.1)

I then extract some information from the summary stats.

I'm not really quite sure what to make of the parameter list in lm.fit

I will look on-line and see if I can find an example showing the use of
this - thanks for pointing me in that direction.

Esmail

>> system.time(replicate(1000, lm(DAX ~.-1, EuStockMarkets)))
>    user  system elapsed
>   26.85    0.07   27.35
>> system.time(replicate(1000, lm.fit(EuStockMarkets[,-1], EuStockMarkets[,1])))
>    user  system elapsed
>   10.76    0.00   10.78
>> system.time(replicate(1000, qr.coef(qr(EuStockMarkets[,-1]), EuStockMarkets[,1])))
>    user  system elapsed
>    3.33    0.00    3.34
>> lm(DAX ~.-1, EuStockMarkets)
> 
> Call:
> lm(formula = DAX ~ . - 1, data = EuStockMarkets)
> 
> Coefficients:
>      SMI       CAC      FTSE
>  0.55156   0.45062  -0.09392
> 
>> # They call give the same coefficients:
> 
>> lm.fit(EuStockMarkets[,-1], EuStockMarkets[,1])$coef
>         SMI         CAC        FTSE
>  0.55156141  0.45062183 -0.09391815
>> qr.coef(qr(EuStockMarkets[,-1]), EuStockMarkets[,1])
>         SMI         CAC        FTSE
>  0.55156141  0.45062183 -0.09391815
>




More information about the R-help mailing list