[R] strategy to iterate over repeated measures/longitudinal data

Wed Jul 15 21:08:32 CEST 2009

Hi Group,

Create some example data.

set.seed(1)
wide_data <- data.frame(
    id=c(1:10),
    predictor1 = sample(c("a","b"),10,replace=TRUE),
    predictor2 = sample(c("a","b"),10,replace=TRUE),
   predictor3 = sample(c("a","b"),10,replace=TRUE),
    measurement1=rnorm(10),
    measurement2=rnorm(10))

head(wide_data)

  id predictor1 predictor2 predictor3 measurement1 measurement2
1  1          a          a          b  -0.04493361  -0.05612874
2  2          a          a          a  -0.01619026  -0.15579551
3  3          b          b          b   0.94383621  -1.47075238
4  4          b          a          a   0.82122120  -0.47815006
5  5          a          b          a   0.59390132   0.41794156
6  6          b          a          a   0.91897737   1.35867955

The measurements are repeated measures, and I am looking at one
predictor at a time. In the actual problem, there are around 400,000
predictors (again, one at a time).

Now, I want to use multiple measurements (the responses) to run a
regression of measurements on a predictor. So I will convert this data
from wide to long format.

I want to iterate through each predictor. So one (inefficient) way is
shown below.

For each predictor:
1. create a long data set using the predictor and all measurements
(using make.univ function from  multilevel package)
2. run model, extract the coefficient of interest
3. go to next predictor

The end result is a vector of 400,000 coefficients.

I'm sure this can be improved upon. I will be running this on a unix
cluster with 16G.
In the wide format, there are 2000 rows (individuals). With 4 repeated
measures,  it seems converting everything up front could be
problematic. Also, I'm not sure how to iterate through that (maybe
putting it in a list).  Any suggestions?

Thanks for your help.

Juliet

Here is the inefficient, working code.

library(multilevel)
library(lme4)

#Same data as above
set.seed(1)
wide_data <- data.frame(
    id=c(1:10),
    predictor1 = sample(c("a","b"),10,replace=TRUE),
    predictor2 = sample(c("a","b"),10,replace=TRUE),
   predictor3 = sample(c("a","b"),10,replace=TRUE),
    measurement1=rnorm(10),
    measurement2=rnorm(10))

#vector of names to iterate over
predictor_names <- colnames(wide_data)[2:4]
#vector to store coefficients
mycoefs <- rep(-1,length(predictor_names))
names(mycoefs) <- predictor_names
for (predictor in predictor_names)
{
   long_data <-  make.univ( data.frame(wide_data$id,wide_data[,predictor]),
    data.frame(
         wide_data$measurement1,
         wide_data$measurement2
    )
  )
   names(long_data) <- c('id', 'predictor', 'time','measurement')
   myfit <- lmer(measurement ~ predictor + (1|id),data=long_data)
   mycoefs[predictor] <- myfit at fixef[2]
}

mycoefs