[R] regression on data subsets in datafile

Dennis Murphy djmuser at gmail.com
Mon Sep 12 11:15:08 CEST 2011


Hi:

Here's one approach:

# date typo fixed in record 5 - changed 35 to 5
tC <- textConnection("
Subject Date    parameter1
bob     3/2/99  10
bob     4/2/99  10
bob     5/5/99  10
bob     6/27/99 NA
bob     8/5/01 10
bob     3/2/02  10
steve   1/2/99  4
steve   2/2/00  7
steve   3/2/01  10
steve   4/2/02  NA
steve   5/2/03  16
kevin   6/5/04  24
")
dat <- read.table(tC, header=TRUE, stringsAsFactors = FALSE)
close.connection(tC)
rm(tC)
# Convert Date to an object of class Date
dat <- transform(dat, date = as.Date(Date, format = '%m/%d/%y'))

# You could do this with transform() and the by() function, but
# here is another way to use the min date per person as time 0
# using package plyr; mutate is a faster alternative to transform
# and can be used for groupwise operations inside of ddply():
library('plyr')
dat <- ddply(dat, .(Subject), mutate, days = as.numeric(date - min(date)))

# Since Kevin has one record, want to return NAs for his coefficients
# The function f returns NA if there are less than three observations
# per subgroup; you can change 3 to 2 if you like. Otherwise, it returns
# the coefficients of the least squares line as a data frame.

f <- function(d) {
   if(nrow(d) < 3) {return(data.frame(intercept = NA, slope = NA))
     } else {
       p <-  coef(lm(parameter1 ~ days, data = d))
       data.frame(intercept = p[1], slope = p[2])
         }
   }
# Apply the function to each person's sub-data frame
ddply(dat, .(Subject), f)
  Subject intercept       slope
1     bob 10.000000 0.000000000
2   kevin        NA          NA
3   steve  3.998485 0.007591638

Another option is to use the lmList() function in the nlme package.

HTH,
Dennis


On Mon, Sep 12, 2011 at 12:42 AM, marcel <marcelcurlin at gmail.com> wrote:
> I have data of the form
>
> tC <- textConnection("
> Subject Date    parameter1
> bob     3/2/99  10
> bob     4/2/99  10
> bob     5/5/99  10
> bob     6/27/99 NA
> bob     8/35/01 10
> bob     3/2/02  10
> steve   1/2/99  4
> steve   2/2/00  7
> steve   3/2/01  10
> steve   4/2/02  NA
> steve   5/2/03  16
> kevin   6/5/04  24
> ")
> data <- read.table(header=TRUE, tC)
> close.connection(tC)
> rm(tC)
>
> I am trying to calculate rate of change of parameter1 in units/day for each
> person. I think I need something like:
> "lapply(split(mydata, mydata$ppt), function(x) lm(parameter1 ~ day,
> data=x))"
>
> I am not sure how to handle the dates in order to have the first day for
> each person be time = 0, and the remaining dates to be handled as days since
> time 0. Also, is there a way to add the resulting slopes to the data set as
> a new column?
>
> Thanks,
> Marcel
>
> --
> View this message in context: http://r.789695.n4.nabble.com/regression-on-data-subsets-in-datafile-tp3806743p3806743.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list