[R] Re: [S] LME - fixed effects model and missing values

Wed Nov 5 01:16:32 CET 2003

Here is an answer to a 1999 post.  I didn't find a direct answer anywhere
on the Web, perhaps because it is "obvious"  once one sees it.

Suppose you have data from a longitudinal study, where each subject was
measured *up to* four times, with missing measurements, so that the data
look like this:

> MAT<- structure(c(23, 24, 6, 19, 16, 20, 13, 11, NA, 8, NA, 21, 19, 15,
11, NA, 10, 12, 16, 30, 13, 16, NA, NA, NA, NA, 28, 6), .Dim = c(7, 4),
.Dimnames = list(c("SUB1", "SUB2", "SUB3", "SUB4", "SUB5", "SUB6",
"SUB7"), c("BDI0", "BDI3", "BDI6", "BDI12")))

> MAT
     BDI0 BDI3 BDI6 BDI12
SUB1   23   11   11    16
SUB2   24   NA   NA    NA
SUB3    6    8   10    NA
SUB4   19   NA   12    NA
SUB5   16   21   16    NA
SUB6   20   19   30    28
SUB7   13   15   13     6

Let

> x<- c(0,3,6,12)

Then it's a simple matter to regress each row of MAT on x, in spite of
missing values.  Take for instance a row that has three missing values:

> lm(MAT["SUB2",] ~ x)

Call:
lm(formula = MAT["SUB2", ] ~ x)

Coefficients:
(Intercept)            x
         24           NA

But now suppose we want to fit a linear mixed-effects model, borrowing
strength from all the data in estimating each individual's intercept
and slope.

As in the naive lm approach, NAs are not a problem.  Unlike the naive lm
approach, however, we obtain values for slope and intercept even for
subjects for whom we cannot get slope via the lm approach, such as SUB2.

First, you must create a data frame with one row for each time point
within each subject, and you need to create a column to keep track of the
time variable.  For instance:

d<-data.frame(sub=I(rep("", length(MAT))), time=rep(NA, length(MAT)), BDI=rep(NA, length(MAT)) )

count<-0
for (i in 1:nrow(MAT)) for (j in 1:ncol(MAT)) {
	count<-count+1
	d[count, "sub"]<- dimnames(MAT) [[1]][i]
	d[count, "time"]<- c(0,3,6,12) [j]
	d[count, "BDI"]<- MAT[i,j]
	}

d$sub<-factor(d$sub)

Then you call lme, telling it to throw away rows with NAs in them.  Note
that this is *not* complete case analysis.  We are not throwing away
subjects (cases) who happen to have a missing BDI, just the time points
that have missing BDI.

> library("nlme")
> mymodel<-lme( BDI ~ time, random=~ 1 + time | sub, data=d, na.action=na.omit)
> mymodel
Linear mixed-effects model fit by REML
  Data: d
  Log-restricted-likelihood: -63.70296
  Fixed: BDI ~ time
(Intercept)        time
 16.8317095  -0.1449387

Random effects:
 Formula: ~1 + time | sub
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev    Corr
(Intercept) 3.5621406 (Intr)
time        0.4671455 0.886
Residual    4.1972943

Number of Observations: 21
Number of Groups: 7

We even get a slope for SUB2, for whom we have only one observation.

> mymodel$coef$random
$sub
     (Intercept)        time
SUB1  -0.4909769 -0.08181262
SUB2   3.0012854  0.34872811
SUB3  -4.5733486 -0.52161792
SUB4  -0.7908920 -0.13009388
SUB5   0.7516454  0.09528318
SUB6   4.4620873  0.61630713
SUB7  -2.3598006 -0.32679398

(Version: R 1.8.0 on Windows.)

This responds to the following email, which may be found at

http://www.biostat.wustl.edu/archives/html/s-news/1999-01/msg00122.html

Jake Wegelin

On Wed, 20 Jan 1999, ziv shkedy wrote:

>
> Dear Jose and all the other S+ users,
> First I want to thank you and Prof. Brian Ripley for your helpful answers.
>
> I'll described the missing values problem in more details.
> I'm modeling longitudinal data where each one of the subjects was measured
> at few time points.
> Some of the subjects dropout from the study, i.e. that the RESPONSE of these
> subjects is missing.
> The analysis in proc MIXED is likelihood based ignorable analysis (i.e. we
> using all available cases at each time point),
> in lme ,if i understood you correctly, if the response is missing then the
> observation is omitted from the dataset. This is a complete case analysis.
> My question is if there is a way to overcome this problem and to use all
> observation at each time point (and in that way the output from proc MIXED
> and lme will be the same).
>
> Now, I'm on my way to get the gls function.
> Thank you once again for your help, Ziv.
>
> =========================================================
> Ziv Shkedy
> Biostatistics
> Center for Statistics
> Limburgs Universitair Centrum
> Universitaire Campus, department WNI
> B-3590 Diepenbeek, Belgium
> Tel: +32-(0)11-26.82.57
> e-mail: ziv.shkedy at luc.ac.be
> =========================================================
>