[R] linear regression for grouped data

David Winsemius dwinsemius at comcast.net
Wed Dec 29 03:31:48 CET 2010


On Dec 28, 2010, at 9:23 PM, Entropi ntrp wrote:

> Hi,
> I have been examining large data and need to do simple linear  
> regression
> with the data which is grouped based on the values of a particular
> attribute. For instance, consider three columns : ID, x, y,  and  I  
> need to
> regress x on y for each distinct value of ID. Specifically, for the  
> set of
> data corresponding to each of the 4 values of ID (76,111,121,168) in  
> the
> below data, I should invoke linear regression 4 times. The challenge  
> is
> that, the length of the ID vector is around 20000 and therefore linear
> regression must be done automatically for each distinct value of ID.
>
>               ID            x                     y
> 76 36476 15.8  76 36493 66.9  76 36579 65.6  111 35465 10.3  111  
> 35756 4.8
> 121 38183 16  121 38184 15  121 38254 9.6  121 38255 7  168 37727  
> 21.9  168
> 37739 29.7  168 37746 97.4

Let's say that is a dataframe named "indat. Try:

  lapply(split(indat, as.factor(indat$ID)), function(df) {lm(y ~ x,  
data=df)} )

> I was wondering whether there is an easy way to group data based on  
> the
> values of ID in R  so that linear regression can be done easily for  
> each
> group determined by each value of ID. Or, is the only way to construct
> loops  with 'for' or 'while'  in which a matrix is generated for each
> distinct value of ID  that stores corresponding values of x and y by
> screening the entire ID vector?
>
> Thanks in advance,
>
> Yasin

-- 

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list