[R] plotting and coloring longitudinal data with three time points (ggplot2)

Hadley Wickham hadley at rice.edu
Wed Dec 7 15:01:05 CET 2011


On Wed, Dec 7, 2011 at 4:02 AM, Eric Fail <eric.fail at gmx.us> wrote:
>  Dear list,
>
> I have been struggling with this for some time now, and for the last hour I have been struggling to make a working example for the list. I hope someone out there have some experience with plotting longitudinal data that they will share.
>
> My data is some patient data with three different time stamps. First the patients are identified at different times (first time stamp). Second, they go through an assessment phase and begin their treatment (time stamp 2). Finally they are admitted from the hospital at some point (time stamp 3),
>
> I would like to make a spaghetti plot with the assessment phase in one color and the treatment phase in another color.
>
> I used ggplot2, and with this example data and only two time points; it works fine (I call it my working example),
>
> library(ggplot2)
> df <- data.frame(
>   date = seq(Sys.Date(), len=104, by="1 day")[sample(104, 52)],
>    patient = factor(rep(1:26, 2), labels = LETTERS)
>  )
> df <- df[order(df$date), ]
> dt <- qplot(date, patient, data=df, geom="line")
> dt + scale_x_date()
> df[ which(df$patient=='E'), c("patient", "date")]
>
> But, if I have three time points, R, for some reason I do not yet understand, add the two second time points in some funny way.
>
> Finally, when that is solved; how do I colorize the different parts of the line so the assessment phase gets one color and the treatment phase another?
>
> I want to be able to show how long we have been in contact with our patients, how much of the contact time that was assessment and how much that was actual treatment.
>
> Below is an example (I call it the not-working example)
>
> df2 <- data.frame(
>   date2 = seq(Sys.Date(), len= 156, by="2 day")[sample(156, 78)],
>   patient2 = factor(rep(1:26, 3), labels = LETTERS)
>  )
>
> df2 <- df2[order(df2$date2), ]
> dt2 <- qplot(date2, patient2, data=df2, geom="line")
> dt2 + scale_x_date(major="months", minor="weeks")
> df2[ which(df2$patient2=='B'), c("patient2", "date2")]

Did you mean something like this?

library(ggplot2)
library(plyr)

df2 <- data.frame(
  date2 = seq(Sys.Date(), len= 156, by="2 day")[sample(156, 78)],
  patient2 = factor(rep(1:26, 3), labels = LETTERS)
)

df2 <- ddply(df2, "patient2", mutate, visit = order(date2))

qplot(date2, patient2, data = df2, geom = "line") +
  geom_point(aes(colour = factor(visit)))

# or this?

library(ggplot2)
library(plyr)

df2 <- data.frame(
  date2 = seq(Sys.Date(), len= 156, by="2 day")[sample(156, 78)],
  patient2 = factor(rep(1:26, 3), labels = LETTERS)
)

df2 <- ddply(df2, "patient2", mutate, visit = order(date2))

qplot(date2, patient2, data = df2, geom = "line", colour =
factor(visit), group = patient2)

# Obviously the lines are drawn between the observations so you only
see the first two visits.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/



More information about the R-help mailing list