[R] data manipulation

Ricardo Pietrobon rpietro at duke.edu
Sun Sep 7 21:32:41 CEST 2003


I am new to R, coming from a few years using Stata. I've been twisting my
brain and checking several R and S references over the last few days to
try to solve this data management problem: I have a data set with a unique
patient identifier that is repeated along multiple rows, a variable with
month of patient encounter, and a continous variable for cost of
individual encounters. The data looks like this:

ID	date		cost
1	"2001-01"	200.00
1	"2001-01"	123.94
1	"2001-03"	100.23
1	"2001-04"	150.34
2	"2001-03"	296.34
2	"2002-05"	156.36

I would like to obtain the median costs and boxplots for the sum of
encounters happening in the first six months after the index encounter
(first patient encounter) for each patient, then the mean and median costs
for the costs happening from 6 to 12 months after the index encounter, and
so on. Notice that the first ID has two encounters during the index date,
making it more difficult to define a single row with the index encounter.

Any help would be appreciated,


Ricardo Pietrobon, MD
Assistant Professor of Surgery
Duke University Medical Center
Durham, NC 27710 US

More information about the R-help mailing list