[R] Create variables with common values for each group

Chuck Cleland ccleland at optonline.net
Tue Jun 20 11:02:04 CEST 2006


Stephan Lindner wrote:
> Dear all,
> 
> sorry, this is for sure really basic, but I searched a lot in the
> internet, and just couldn't find a solution. 
> 
> The problem is to create new variables from a data frame which
> contains both individual and group variables, such as mean age for an
> household. My data frame:
> 
> 
> 
> df 
> 
>        hhid h.age
> 1  10010020    23
> 2  10010020    23
> 3  10010126    42
> 4  10010126    60
> 5  10010142    20
> 6  10010142    49
> 7  10010142    52
> 8  10010150    18
> 9  10010150    51
> 10 10010150    28
> 
> 
> where hhid is the same number for each household, h.age the age for
> each household member. 
> 
> I tried tapply, by(), and aggregate. The best I could get was:
> 
> by(df, df$hhid, function(subset) rep(mean(subset$h.age,na.rm=T),nrow(subset)))
> 
> df$hhid: 10010020
> [1] 23 23
> ------------------------------------------------------------ 
> df$hhid: 10010126
> [1] 51 51
> ------------------------------------------------------------ 
> df$hhid: 10010142
> [1] 40.33333 40.33333 40.33333
> ------------------------------------------------------------ 
> df$hhid: 10010150
> [1] 32.33333 32.33333 32.33333
> 
> 
> Now I principally only would have to stack up the mean values, and
> this is where I'm stucked. The function aggregate works nice, and I
> could loop then, but I was wondering whether there is a better way to
> do that. 

   You could use aggregate() and then merge() the result with df. 
Something like this:

 > df.agg <- aggregate(df$h.age, list(hhid = df$hhid), mean)
 >
 > names(df.agg)[2] <- "mean.age"
 >
 > merge(df, df.agg)
        hhid h.age mean.age
1  10010020    23 23.00000
2  10010020    23 23.00000
3  10010126    42 51.00000
4  10010126    60 51.00000
5  10010142    20 40.33333
6  10010142    49 40.33333
7  10010142    52 40.33333
8  10010150    18 32.33333
9  10010150    51 32.33333
10 10010150    28 32.33333

> My end result should look like this (assigning mean.age to the data frame):
> 
> 
> 
>        hhid h.age  mean.age
> 1  10010020    23     23.00
> 2  10010020    23     23.00
> 3  10010126    42     51.00
> 4  10010126    60     51.00
> 5  10010142    20     40.33
> 6  10010142    49     40.33
> 7  10010142    52     40.33
> 8  10010150    18     32.33
> 9  10010150    51     32.33
> 10 10010150    28     32.33
> 
> 
> 
> Cheers, and thanks a lot,
> 
> 
> Stephan Lindner
> 
> 
> 
> 

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894



More information about the R-help mailing list