[R] How to run lm for each subset of the data frame, and then aggreage the result?

David Winsemius dwinsemius at comcast.net
Sun May 19 18:19:01 CEST 2013

On May 19, 2013, at 5:31 AM, CHEN, Cheng wrote:

> Hi gurus,
> I have a big data frame df, with columns named as :
> age, income, country
> what I want to do is very simpe actually, do
> fitFunc<-function(thisCountry){
>    subframe<-df[which(country==thisCountry),];
>    fit<-lm(income~0+age, data=subframe);
>    return(coef(fit));}
> for each individual country. Then aggregate the result into a new data
> frame looks like :
>    countryname,  coeffname1      USA         1.22      GB
> 1.03      France      1.1
> I tried to do :
> do.call("rbind", lapply(countries, fitFunc))
This suggests you have used 'attach' on df. Not a safe practice.

> but this only gives something like:
>          age
> [1,] 2.540879
> [2,] 2.428830
> [3,] 2.369560
> How should I proceed?

That is exactly the sort of result I would have expected from your procedure. We cannot tell what you want that is different. For one thing you are posting  in HTML so the "aggregate result above is mangled.  I'm guessing it might have been.

countryname,  coeffname1      
USA         1.22      
GB          1.03     
France      1.1

So perhaps the only thing that is missing are the row names?

res <- do.call("rbind", lapply(df$countries, fitFunc)
rownames(res) <- as.character(df$countries)

If you had wanted a dataframe to be returned you could do this with the 'by' function or return a list with countries instead of a numeric vector from your 'fitFunc' calls. rbind a list of lists may give you something that should easily be coerced to data.frame. (But no data to test these theories)

> 	[[alternative HTML version deleted]]
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA

More information about the R-help mailing list