[R] data manipulation

Peter Dalgaard BSA p.dalgaard at biostat.ku.dk
Sun Sep 7 22:34:18 CEST 2003


Ricardo Pietrobon <rpietro at duke.edu> writes:

> ID	date		cost
> 1	"2001-01"	200.00
> 1	"2001-01"	123.94
> 1	"2001-03"	100.23
> 1	"2001-04"	150.34
> 2	"2001-03"	296.34
> 2	"2002-05"	156.36
> 
> 
> I would like to obtain the median costs and boxplots for the sum of
> encounters happening in the first six months after the index encounter
> (first patient encounter) for each patient, then the mean and median costs
> for the costs happening from 6 to 12 months after the index encounter, and
> so on. Notice that the first ID has two encounters during the index date,
> making it more difficult to define a single row with the index encounter.
> 
> Any help would be appreciated,

Let's see... You're going to need a bit of slight ugliness to convert
the date to a numeric month number. Something like (NB: That's a code
that means "I didn't actually try this"...)

attach(yourdata)
monthnum <- sapply(strsplit(date,"-"),function(x)sum(as.numeric(x)*c(12,1)))

Then we need a table of the index dates for each person

tbl <- tapply(monthnum, ID, min)

Now subtract the index date from monthnum

months.post.index <- monthnum - tbl[ID]

then you probably want to look at the subset of your original data
frame and do the sums

total.cost.6mo <- with(subset(yourdata,months.post.index < 6), 
                       tapply(cost,ID,sum))

and finally

boxplot(total.cost.6mo)
median(total.cost.6mo)

(You could elaborate by converting months.post.index with cut() and
use lapply(names(period),.....) to give you a list of tables, which
boxplot() might actually know how to plot directly.)
-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907




More information about the R-help mailing list