[R] tapply and names

Göran Broström gb at tal.stat.umu.se
Tue Jan 25 19:43:16 CET 2005


On Tue, Jan 25, 2005 at 10:43:24AM -0500, Liaw, Andy wrote:
> > From: Göran Broström
> > 
> > I have a data frame containing children, with variables 'year' = birth
> > year, and 'm.id' = mother's id number. Let's assume that all 
> > the births of
> > each mother is represented in the data frame. 
> > 
> > Now I want to create a subset of this data frame containing 
> > all children,
> > whose mother's first birth was in the year 1816 or later. 
> > This seems to
> > work: 
> > 
> >     mid <- tapply(dat$year, dat$m.id, min)
> >     mid <- as.numeric(names(mid)[mid >= 1816])
> >     dat <- dat[dat$m.id %in% mid, ]
> > 
> > but I'm worried about the second line, because the output 
> > from 'tapply'
> > isn't documented to have a 'dimnames' attribute (although it 
> > has one, at
> > least in R-2.1.0, 2005-01-19). Another aspect is that this 
> > code relies on
> > m.id being numeric; I would have to change it if the type of 
> > m.id changes
> > to, eg, character.
> > 
> > So, question: Is there a better way of doing this?
> 
> Would this work?
> 
>   dat <- dat[ave(dat$year, dat$m.id, min) >= 1816, ]

Yes, but you (or I) need

> dat <- dat[ave(dat$year, dat$m.id, FUN = min) >= 1816, ]
                                     ^^^^^
(took me some time to figure out), because

?ave

Usage:

     ave(x, ..., FUN = mean)

Thanks Andy for giving me 'ave'! And thanks to Dimitris for his suggestion. 

Göran




More information about the R-help mailing list