[R] Creating new variable with maximum visit date by group_id

David Winsemius dwinsemius at comcast.net
Thu Aug 25 00:25:08 CEST 2011


On Aug 24, 2011, at 5:15 PM, Kathleen Rollet wrote:

> Dear R users,
>
> I am encoutering the following problem: I have a dataset with a  
> 'unique_id' and different 'visit_date' (formatted as.Date, "%d/%m/ 
> %Y") per unique_id. I would like to create a new variable with the  
> most recent date of visit per unique_id as shown below.

That should not result in what is below unless you have changes  
something in options() forcing a different data output format. (Is  
that even possible?)
>
> unique_id visit_date last_visit_date
> 1  01/06/2010  01/06/2011
> 1  01/01/2011  01/06/2011
> 1  01/06/2011  01/06/2011
> 2  01/01/2009  01/07/2011
> 2  01/06/2009  01/07/2011
> 2  01/06/2010  01/07/2011
> 2  01/01/2011  01/07/2011
> 2  01/07/2011  01/07/2011
> 3  01/01/2008  01/01/2008
> 4  01/01/2009  01/01/2010
> 4  01/01/2010  01/01/2010
>
Read it in as dfrm named "dat" with:
colClasses=c("numeric", "character", "character")

Then:

dat$visit_date <-as.Date(dat$visit_date, format="%d/%m/%Y",  
origin="1970-01-01")
dat$last_visit_date <-as.Date(dat$last_visit_date, format="%d/%m/%Y",  
origin="1970-01-01")

> I know the coding to easily do this in Stata, SAS, and Excel but I  
> cannot find how to do it in R. I try multiple function such as  
> tapply( ), ave( ), ddply ( ), and transform ( ) after looking into  
> previous postings. The codes are running but only NA values are  
> generated or I get error messages that the replacement has less row  
> than the data has (there are about 1000 unique_id and over 4000 rows  
> in my dataset presently).

The 'ave' function should be able to do it. It returns a vector as  
long as the dataframe has rows.
You are asked to post your failures as well as reproducible code which  
is best produced with dput(). (This apples doubly so when you choose  
non-standard formats for Date objects.)

Please read:
?dput
?ave

Worked example:

dat$most_recent<- format(ave(dat$visit_date, dat$unique_id, FUN=max),  
format="%d/%m/%Y")
dat

NOTE: that last column is not an R date but rather a character vector.



> I would greatly appreciate if someone could help me.
>
> Thank you!
>
> Kathleen R.
> Epidemiologist
> Montreal, QC, Canada 		 	   		
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list