[R] lm() with same formula but different column/factor combinations in data frame

Fri Dec 26 20:38:06 CET 2008

Try variations of this:

library(leaps)
b<-regsubsets(Fertility~.,data=swiss)
w <- summary(b)$which
lapply(1:nrow(w), function(i) coef(lm(Fertility ~., swiss[w[i, ]])))

On Fri, Dec 26, 2008 at 1:57 PM, Murtaza Das <murtazadas at gmail.com> wrote:
> Thanks for replying Gabor.
>
> I checked the leaps() function and i think it is intended to find the
> best combination of predictors in the linear model.
> Does leaps have a way to combine different factor columns in my data
> frame as follows :
>
> I have the regression model fixed. The combination of predictor
> variables used always remains the same.
> UncDmd ~ M1 + M2 + M3 + M4 + M5 + M6 + M7 + M8 + M9 + M10 + M11
>
> I want to get the coefficients in this linear model  when different
> combinations of factors (select a combination from first four columns
> of the data frame) and their levels are taken from a data frame(apply
> lm model for a each combination of levels within the selected factor
> columns). Thus corresponding to each combination, the data used to
> determine the model coefficients will be different.
>
> I am attaching the data and R files (long method using loops) that I
> use to get the result. Currently, I modify keys to get different
> combinations. Also, note in the script, the data frame is named LRO1.
>
> Thanks again,
> Murtaza
>
>
> On Fri, Dec 26, 2008 at 12:58 PM, Gabor Grothendieck
> <ggrothendieck at gmail.com> wrote:
>> See the leaps package.
>>
>> On Fri, Dec 26, 2008 at 12:37 PM, Murtaza Das <murtazadas at gmail.com> wrote:
>>> Hi,
>>>
>>> I am trying to find an efficient way of applying a linear regression
>>> model to different factor combinations in a data frame.
>>> I want to obtain the output with minimal or no use of loops if
>>> possible. Please let me know if this query is unclear.
>>>
>>> Thanks,
>>> Murtaza
>>>
>>> ***********************************************************************************************************************************************************
>>>
>>> The data frame TEST1 has four factor columns followed by thirteen
>>> numeric columns defined as :
>>> 1) Community, levels: "20232"
>>> 2) WT, levels: "B", "E", "M"
>>> 3) LTC, levels: "L", "M", "S", "1"
>>> 4) UC, levels: "1X1", "2X2"
>>> 5) UncDmd: Response variable in the linear model
>>> 6-16) M1...M11: Explanatory variables in the linear model
>>>
>>> A few sample rows in the data frame are as follows:
>>>> TEST1[1:15,]
>>>   Community WT LTC  UC   UncDmd M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11
>>> 1      20232  E   L 1X1 1.000000  0  0  0  0  0  0  0  0  0   0   1
>>> 2      20232  E   L 2X2 0.000000  0  0  0  0  0  0  0  0  0   0   1
>>> 3      20232  E   M 1X1 1.000000  0  0  0  0  0  0  0  0  0   0   1
>>> 4      20232  E   M 2X2 1.000000  0  0  0  0  0  0  0  0  0   0   1
>>> 5      20232  E   S 1X1 0.000000  0  0  0  0  0  0  0  0  0   0   1
>>> 6      20232  E   S 2X2 0.000000  0  0  0  0  1  0  0  0  0   0   0
>>> 7      20232  B   1 1X1 0.209117  0  0  0  0  0  0  0  0  0   0   1
>>> 8      20232  B   1 2X2 0.190605  0  0  0  0  0  0  0  0  0   0   1
>>> 9      20232  B   L 1X1 0.000000  0  0  0  0  1  0  0  0  0   0   0
>>> 10     20232  B   L 2X2 1.000000  0  0  0  0  0  0  0  0  0   0   1
>>> 11     20232  B   M 1X1 4.000000  0  0  0  0  0  0  0  0  0   0   1
>>> 12     20232  B   M 2X2 0.000000  0  0  0  0  0  0  0  0  0   0   1
>>> 13     20232  B   S 1X1 0.000000  1  0  0  0  0  0  0  0  0   0   0
>>> 14     20232  B   S 2X2 0.000000  0  0  0  0  0  0  0  0  0   0   1
>>> 15     20232  M   1 1X1 0.618689  0  0  0  0  0  0  0  0  0   1   0
>>>
>>> *********************************************************************************************************************************************************
>>> I need to store the coefficients using lm() for different combinations
>>> of the 4 factors, or different combinations of 3 factors or different
>>> combinations of 2 factors or
>>> differennt combinations of 1 factor.
>>> The formula remains fixed as:
>>>> Formula
>>> UncDmd ~ M1 + M2 + M3 + M4 + M5 + M6 + M7 + M8 + M9 + M10 + M11
>>>
>>> So, different models I want to solve in R are :
>>> 1) Community :                     lm(Formula,TEST1[  as.logical(
>>> (TEST1[[1]]=="20232") ) , ])
>>> 2) WT :                            lm(Formula,TEST1[  as.logical(
>>> (TEST1[[2]]=="B") ) , ])
>>> 3) WT :                            lm(Formula,TEST1[  as.logical(
>>> (TEST1[[2]]=="E") ) , ])
>>> 4) WT :                            lm(Formula,TEST1[  as.logical(
>>> (TEST1[[2]]=="M") ) , ])
>>> 5) LTC :                           lm(Formula,TEST1[  as.logical(
>>> (TEST1[[3]]=="L") ) , ])
>>> 6) LTC :                           lm(Formula,TEST1[  as.logical(
>>> (TEST1[[3]]=="M") ) , ])
>>> 7) LTC :                           lm(Formula,TEST1[  as.logical(
>>> (TEST1[[3]]=="S") ) , ])
>>> 8) LTC :                           lm(Formula,TEST1[  as.logical(
>>> (TEST1[[3]]=="1L") ) , ])
>>> 9) UC :                            lm(Formula,TEST1[  as.logical(
>>> (TEST1[[4]]=="1X1") ) , ])
>>> 10) UC :                           lm(Formula,TEST1[  as.logical(
>>> (TEST1[[4]]=="2X2") ) , ])
>>> 11) Community, WT :                lm(Formula,TEST1[  as.logical(
>>> (TEST1[[1]]=="20232") * (TEST1[[2]]=="B") ) , ])
>>> 12) Community, WT :                lm(Formula,TEST1[  as.logical(
>>> (TEST1[[1]]=="20232") * (TEST1[[2]]=="E") ) , ])
>>> 13) Community, WT :                lm(Formula,TEST1[  as.logical(
>>> (TEST1[[1]]=="20232") * (TEST1[[2]]=="M") ) , ])
>>> 14) Community, LTC :               lm(Formula,TEST1[  as.logical(
>>> (TEST1[[1]]=="20232") * (TEST1[[3]]=="L") ) , ])
>>> 15) Community, LTC :               lm(Formula,TEST1[  as.logical(
>>> (TEST1[[1]]=="20232") * (TEST1[[3]]=="M") ) , ])
>>> 16) Community, LTC :               lm(Formula,TEST1[  as.logical(
>>> (TEST1[[1]]=="20232") * (TEST1[[3]]=="S") ) , ])
>>> 17) Community, LTC :               lm(Formula,TEST1[  as.logical(
>>> (TEST1[[1]]=="20232") * (TEST1[[3]]=="1") ) , ])
>>> 18) Community, UC :                lm(Formula,TEST1[  as.logical(
>>> (TEST1[[1]]=="20232") * (TEST1[[4]]=="1X1") ) , ])
>>> 19) Community, UC :                lm(Formula,TEST1[  as.logical(
>>> (TEST1[[1]]=="20232") * (TEST1[[4]]=="2X2") ) , ])
>>> 20) WT, LTC :                      lm(Formula,TEST1[  as.logical(
>>> (TEST1[[2]]=="B") * (TEST1[[3]]=="L") ) , ])
>>> 21) WT, LTC :                      lm(Formula,TEST1[  as.logical(
>>> (TEST1[[2]]=="B") * (TEST1[[3]]=="M") ) , ])
>>> 22) WT, LTC :                      lm(Formula,TEST1[  as.logical(
>>> (TEST1[[2]]=="B") * (TEST1[[3]]=="S") ) , ])
>>> 23) WT, LTC :                      lm(Formula,TEST1[  as.logical(
>>> (TEST1[[2]]=="B") * (TEST1[[3]]=="1") ) , ])
>>> 24) WT, LTC :                      lm(Formula,TEST1[  as.logical(
>>> (TEST1[[2]]=="E") * (TEST1[[3]]=="L") ) , ])
>>> 25) WT, LTC :                      lm(Formula,TEST1[  as.logical(
>>> (TEST1[[2]]=="E") * (TEST1[[3]]=="M") ) , ])
>>> 26) WT, LTC :                      lm(Formula,TEST1[  as.logical(
>>> (TEST1[[2]]=="E") * (TEST1[[3]]=="S") ) , ])
>>> 27) WT, LTC :                      lm(Formula,TEST1[  as.logical(
>>> (TEST1[[2]]=="E") * (TEST1[[3]]=="1") ) , ])
>>> 28) WT, LTC :                      lm(Formula,TEST1[  as.logical(
>>> (TEST1[[2]]=="M") * (TEST1[[3]]=="L") ) , ])
>>> 29) WT, LTC :                      lm(Formula,TEST1[  as.logical(
>>> (TEST1[[2]]=="M") * (TEST1[[3]]=="M") ) , ])
>>> 30) WT, LTC :                      lm(Formula,TEST1[  as.logical(
>>> (TEST1[[2]]=="M") * (TEST1[[3]]=="S") ) , ])
>>> 31) WT, LTC :                      lm(Formula,TEST1[  as.logical(
>>> (TEST1[[2]]=="M") * (TEST1[[3]]=="1") ) , ])
>>> 32) WT, UC :
>>> ...
>>> ...
>>> xx) LTC, UC :
>>> ...
>>> xxx) Community, WT, LTC :
>>> ...
>>> ...
>>> and so on upto:
>>> xxxx) Community, WT, LTC, UC :  lm(Formula,TEST1[  as.logical(
>>> (TEST1[[1]]=="20232") * (TEST1[[2]]=="M") * (TEST1[[3]]=="1") ) *
>>> (TEST1[[4]]=="2X2"), ])
>>> ***********************************************************************************************************************************************************
>>> Desired Output format (or something simlar):
>>>  Factor1 Factor2 Factor3 Factor4 Intercept  M1  M2  M3  M4  M5  M6
>>> M7  M8  M9  M10  M11
>>> 1) 20232                                            x        x   x
>>> x   x   x   x   x   x   x   x    x
>>> 2)           B                                         x        x   x
>>>  x   x   x   x   x   x   x   x    x
>>> 3)           E                                         x        x   x
>>>  x   x   x   x   x   x   x   x    x
>>> 4)           M                                        x        x   x
>>>  x   x   x   x   x   x   x   x    x
>>> 5)                         L                           x        x   x
>>>  x   x   x   x   x   x   x   x    x
>>> 6)                        M                           x        x   x
>>>  x   x   x   x   x   x   x   x    x
>>> 7)                        S                           x        x   x
>>>  x   x   x   x   x   x   x   x    x
>>> 8)                         1                           x        x   x
>>>  x   x   x   x   x   x   x   x    x
>>> 9)                                   1X1             x        x   x
>>> x   x   x   x   x   x   x   x    x
>>> 10)                                  2X2            x        x   x
>>> x   x   x   x   x   x   x   x    x
>>> 11) 20232    B                                   x        x   x    x
>>> x   x   x   x   x   x   x    x
>>> ..
>>> ..
>>> and so on..
>>>
>>>
>>> x is the respective coefficient obtained from the linear fit.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>