[R] stats lm() function

Paul Hermes paul.hermes at analytic-company.com
Fri Mar 13 02:43:56 CET 2009


ok,
i think i have to be more precise of what we are doing.
first thing: this code is not from me, and Im new to R (and never touched 
anything like this)
Im just the lucky guy who has to maintain this crap :)
this call to the lm function is part of a code wich is used to predict the 
marketvalues from a bunch of our products.
as 'target' function it gets the past marketvalues we have in our 
database.(this is what goes into the 'data' parameter into the lm function)

then we have allot other prices and enviromental data (like similar 
products, stock sizes, seasonal informations, .... )
with this, the big formula is created (y ~ x1 + x2 + x3 + x4 + x5 ....... + 
x300)


all this goes into the lm call. then the result is somehow anaylsed to 
figure out wich input data-set had the least influence (or similaryti ) to 
the past marketvalues. this one gets eleminated and lm is called again 
wihout this data-set.
this is done until we just have a small number of datasets left.

could be that everything im writing here is totaly bullshit (cause im not 
shure if i got every thing right)
but this thing is working an creates very nice predictions ;)

i just fugured that the lm call's in this loop tooks the most time and i 
want to reduce this.
any ideas?

----- Original Message ----- 
From: "David Winsemius" <dwinsemius at comcast.net>
To: "Paul Hermes" <paul.hermes at analytic-company.com>
Cc: <r-help at r-project.org>
Sent: Thursday, March 12, 2009 3:42 PM
Subject: Re: [R] stats lm() function


>I think you will find that many readers of this list would rather try  to 
>dissuade you from this misguided strategy. You are unlikely to get  to a 
>sensible solution in using step-down procedures with this sort of 
>situation (large number of predictors with modest size of data).
>
> -- 
> David Winsemius
>
> On Mar 12, 2009, at 1:59 PM, Paul Hermes wrote:
>
>> Hi,
>>
>> Im using the lm() function where the formula is quite big (300 
>> arguments) and the data is a frame of 3000 values.
>>
>> This is running in a loop where in each step the formula is reduced  by 
>> one argument, and the lm command is called again (to check which 
>> arguments are useful) .
>>
>> This takes 1-2 minutes.
>> Is there a way to speed this up?
>> i checked the code of the lm function and its seems that its  preparing 
>> the data and then calls lm.Fit(). i thought about just  doing this 
>> praparing stuff first and only call lm.fit() 300 times.
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT




More information about the R-help mailing list