[R] aggregation with extra columns

Paul Sorenson Paul.Sorenson at vision-bio.com
Wed Feb 2 02:17:32 CET 2005


R People,

Thanks for your help on my recent questions, Excel is never going to disappear from my office but with graphics from lattice package and some other stuff in R I have been able to add some value.

I have a problem I haven't been able to figure out with aggregation, I mentioned it earlier but didn't state it very clearly.

Basically I have many "defect events" and I want to grab the most recent event for each defect number:

eg:
"date" 	"defectnum" "state"
2004-12-1	10		create
2004-12-2	11		create
2004-12-4	10		close
2004-12-7	11		fix

to:
"date" 	"defectnum" "state"
2004-12-4	10		close
2004-12-7	11		fix

Now with aggregate I can get the rows I want but not with the state "attached":

aggregate(list(date=ev$date), by=list(defectnum=ev$defectnum), max)

Gives me the rows I want but I have lost the "state".  I have tried doing a merge afterwards but now I realise why they warned me avoid using dates as database keys.

What would be handy is somehow getting back the index vector from the aggregate function.  I realize in the general case this wouldn't work for aggregate but in the case of min/max the result is a specific record.

Someone earlier mentioned some tricks with sort but I haven't been able to make that get to where I want.




More information about the R-help mailing list