[R] LME & data with complicated random & correlational structures

Wed Dec 7 03:32:52 CET 2005

	  Have you received any replies to this post?  I haven't seen any, so I 
will attempt a few comments.  First, I'm overwhelmed with the details in 
your discussion.  I suggest that for each of your question, you try to 
think of an extremely simple example that would test your question, as 
suggested in the Posting Guide (www.R-project.org/posting-guide.html). 
If you have problems with that, please submit a question focused on that 
one thing.

	  Have you looked at Pinheiro and Bates (2000) Mixed Effects Models in 
S and S-Plus (Springer)?  If no, I suggest you take a hard look at this 
book.  I was unable to get anything sensible from "lme" until I started 
working through this book.  This is the primary reference for "lme", and 
for me at least, it was essential to understanding what I needed to do 
to get anything useful from "lme".  You don't follow all the math in 
this book to get something useful out of it, because it includes many 
worked examples that should go a long way to answering many questions 
you might have about "lme".

	  1.  "if I set something in my model formula as a fixed effect, then 
it does not make sense to set it as a random effect as well? ...

 > Temperature ~ Stimulus-1, random=Subj|Subj/Epoch/Stimulus

	  I perceive several problems here.  Have you tried the following:

 > Temperature ~ Stimulus-1, random=~1|Subj/Epoch

 From your description, I'm guessing that Stimulus is a factor with 
levels like ("Baseline", "A", "B", "Recovery").  My way of thinking 
about these things is to try to write this as an algebraic model with 
parameters to estimate.  "Temperature~Stimulus-1", ignoring the "random" 
argument for the moment, could be written as follows:

(1) Temperature = b["Baseline"]*I(Stimulus=="Baseline") + 
b["A"]*I(Stimulus=="A") + b["B"]*I(Stimulus=="B") +
b["Recovery"]*I(Stimulus=="Recovery"),

	  where the 4 "b" parameters are to be estimated by "iterative, 
reweighted generalized least squares" to maximize an appropriate 
likelihood function [and where I(...) = indicator function that is 1 if 
the (...) is TRUE and 0 otherwise].

	  Now consider "random=~1|Subj";  ignore "epoch" for the moment.  This 
adds one "random coefficient" to this model for each Sub.  If you have 2 
subjects, then the model becomes something like the following:

(2) Temperature = b["Baseline"]*I(Stimulus=="Baseline") + 
b["A"]*I(Stimulus=="A") + b["B"]*I(Stimulus=="B") +
b["Recovery"]*I(Stimulus=="Recovery") + b.subj[1]*I(Subj==1)
+ b.subj[2]*I(Subj==2).

	  However, we do NOT initially estimate the random coefficients 
b.subj[1] and b.subj[2].  Rather, we assume these coefficients "b.subj" 
are normally distributed with mean 0 and a variance "var.subj", and we 
want to estimate "var.subj".  (We may later estimate the b.subj's 
conditioned on the estimate of "var.subj", but that's a separate issue.) 
  We estimate "var.subj" using "iterative, reweighted generalized least 
squares" that roughly speaking "uncorrelates" or "whitens" the 
correlated residuals from model (1) and then minimizes the sum of 
squares of those "whitened" residuals.  More detail is provided in 
Pinheiro and Bates.

	  Now consider "random=~1|Subj/Epoch".  This adds another random 
coefficient for each (Subject, Epoch) combination, which we also assume 
are normally distributed with mean 0 and variance "var.s.epoch".  We 
then want to estimate the 4 fixed effect coefficients and the two 
variance coefficients simultaneously, using the same approach I just 
outlined.

	> 2- Is it possible to take a piecewise approach wrt the variance using 
lme(), such as modeling the variability of each subject first ... . 
When I try to set up the correlation structure, I run out of memory fast."

	  You've hit on a great idea here:  Consider the data for each subject 
separately.  Plot it, think about the similarities and differences, and 
condense the data into, e.g., 1 or a few numbers per subject / epoch 
combination.  Then use "lme" on this condensed data set.  I'd start 
simple and add complexity later.  I have on occasion tried to fit the 
most complicated model I could think of, only to find that I could have 
gotten most of the information from condensing it grossly and fitting 
much simpler models, and I wasted lots of time and effort trying to work 
with all the data at once.  I suggest you start by aggregating it to the 
grossest level you think might answer your research questions. After you 
have sensible answers at that level, if you still have time and energy 
for this project, I then might try aggregate the data into more 
summaries with fewer observations in each summary.

	  3.  Is there a way to get rid of the serial dependency BEFORE running 
the model with LME(), such as initiating a corStruct before placing it 
in the model?

	  Answer:  Yes, and the primary way to do this would be to aggregate, 
like I just suggested.  If you still want to do more with the 
correlation structure, study Pinheiro and Bates and try something with a 
moderately condensed version of what you have.

	  hope this helps.
	  spencer graves

Keith Chamberlain wrote:

> Dear List,
> 
> This is my first post, and I'm a relatively new R user trying to work out a
> mixed effects model using lme() with random effects, and a correlation
> structure, and have looked over the archives, & R help on lme, corClasses, &
> etc extensively for clues. My programming experience is minimal (1 semester
> of C). My mentor, who has much more programming experience, but a comparable
> level of knowledge of R, has been stumped. What follows are 3 questions
> pertaining to an lme() model, one on the nested hierarcy, 1 on a strategy
> for a piecewise approach to the variance given I have ~24 hours of data
> (sampled at 32Hz, 1hr per subject), and one on the corStruct or how to get
> rid of serial dependencies before lme().
> 
> I'm analyzing skin temperature continuously recorded at 32Hz in Baseline (10
> min), Testing (~5 min), and Recovery (20 min) epochs of a face recognition
> experiment. Stimuli are the same in Baseline and Recovery (portrait or
> landscape), and in testing, participants were tested on their recognition of
> a list b&w portraits presented just before testing started. On some of the
> portraits 'learned' the eyes were masked, and in others, the eyes were
> visible. In testing, the portraits have no masking but the stimuli in
> testing are labeled "Eyes" and "NoEyes". The data structure looks as
> follows:
> 
> Subj/Epoch/Stimuli/Time/Temperature
> There are 8 subjects
> 
> 9 epochs - 6 of which were just "instruction" blocks, and one "Learning"
> period. Wrt lme(), I figured out how to use subset too isolate just the
> Baseline, Learning, and Testing Epochs (and avoid epochs with only 1
> stimulus level, such as "instruction"). Data within each epoch are balanced
> wrt # trials, but not between epochs. Recovery has twice as many trials as
> Baseline, and Testing has about half. Time for each epoch is roughly that
> ratio too, although time in each trial differs.
> 
> Stimuli are the same in Baseline & Recovery, but different in Testing,
> although there are 2 levels in each used epoch.
> 
> Time & Temperature make up the time series, and Temperature is the dependent
> variable too stimulus. 
> 
> 1- are fixed effects and random effects discrete? That is, if I set
> something in my model formula as a fixed effect, then it does not make sense
> to set it as a random effect as well? The documentation (and posts) were not
> really clear on that point (not that the documentation technically 'should'
> be per say, just that I got confused).
> 
> The nested hierarchy for what actually gets analyzed looks as follows:
> Subj/Epoch/Stimulus/Temperature
> 
> Reasoning: there are several temperature samples recorded in each trial of
> Stimulus. Several stimuli in each Epoch, and all of the Epochs for one
> subject. Subject is random (theoretically) because of sampling in a
> population, Epoch would be fixed because all participants went through the
> same sequence of Epochs, but Stimulus varied randomly within an Epoch, which
> seems inconsistent when I apply it to the lme model as both a fixed and
> random effect.
> 
> Temperature ~ Stimulus-1, random=Subj|Subj/Epoch/Stimulus
> Subset= Epoch=="Baseline" | Epoch=="Testing" | Epoch=="Recovery"
> 
> I'm looking to correctly allocate error terms for between subjects (Subj)
> variability, and further delineate the within subject error between Epoch
> and Stimulus. The current model that I got to work (memory issues namely) is
> Temperature ~ Stimulus-1, random=Subj|Subj, which I decided to use to get
> the residuals to have the Subject variability accounted for and subtracted.
> Would a list of random structures work better? If so, is each item in the
> list structured just as the random formula? I haven't actually seen/found
> any examples of a list of random/nesting structures.
> 
> 2- Is it possible to take a piecewise approach wrt the variance using lme(),
> such as modeling the variability of each subject first, then using
> further-nested terms in a model and the residuals from the previous? If so,
> what caveats exist for interpreting the variances?
> 
> I'm not interpreting p-values at this point because of another issue. When I
> try to set up the correlation structure, I run out of memory fast. I've
> tried this on a mac G5, an HP Pavilion dv1000 (= Pentium 2.6GHz), and a
> Gateway with an AMD athalon 900MHz processors. Each system has 386M memory
> or more, one of which has 1G. 
> 
> 3- Is there a way to get rid of the serial dependency BEFORE running the
> model with LME(), such as initiating a corStruct before placing it in the
> model? I'm working with so much data that I'm fine with doing the process
> piecewise. An AR process was difficult because the residuals are not the
> same length as the data file that I started with. Serial dependencies still
> gota go, whether via the correlation term in lme() or some other method,
> because I'll soon be breaking up the variance into components via
> spectrum().
> 
> So I might as well add a 4th. What's the term that gets me too data after
> AR() has done it's work? I'm thinking that resid() wasn't right but data
> that the data differ from their original length prior to an AR process may
> be how its done.
> 
> Rgds,
> KeithC.
> Psych Undergrad, CU Boulder
> RE McNair Scholar
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA

spencer.graves at pdf.com
www.pdf.com <http://www.pdf.com>
Tel:  408-938-4420
Fax: 408-280-7915