[R] LME & data with complicated random & correlational structures

Thu Dec 1 22:44:48 CET 2005

Dear List,

This is my first post, and I'm a relatively new R user trying to work out a
mixed effects model using lme() with random effects, and a correlation
structure, and have looked over the archives, & R help on lme, corClasses, &
etc extensively for clues. My programming experience is minimal (1 semester
of C). My mentor, who has much more programming experience, but a comparable
level of knowledge of R, has been stumped. What follows are 3 questions
pertaining to an lme() model, one on the nested hierarcy, 1 on a strategy
for a piecewise approach to the variance given I have ~24 hours of data
(sampled at 32Hz, 1hr per subject), and one on the corStruct or how to get
rid of serial dependencies before lme().

I'm analyzing skin temperature continuously recorded at 32Hz in Baseline (10
min), Testing (~5 min), and Recovery (20 min) epochs of a face recognition
experiment. Stimuli are the same in Baseline and Recovery (portrait or
landscape), and in testing, participants were tested on their recognition of
a list b&w portraits presented just before testing started. On some of the
portraits 'learned' the eyes were masked, and in others, the eyes were
visible. In testing, the portraits have no masking but the stimuli in
testing are labeled "Eyes" and "NoEyes". The data structure looks as
follows:

Subj/Epoch/Stimuli/Time/Temperature
There are 8 subjects

9 epochs - 6 of which were just "instruction" blocks, and one "Learning"
period. Wrt lme(), I figured out how to use subset too isolate just the
Baseline, Learning, and Testing Epochs (and avoid epochs with only 1
stimulus level, such as "instruction"). Data within each epoch are balanced
wrt # trials, but not between epochs. Recovery has twice as many trials as
Baseline, and Testing has about half. Time for each epoch is roughly that
ratio too, although time in each trial differs.

Stimuli are the same in Baseline & Recovery, but different in Testing,
although there are 2 levels in each used epoch.

Time & Temperature make up the time series, and Temperature is the dependent
variable too stimulus. 

1- are fixed effects and random effects discrete? That is, if I set
something in my model formula as a fixed effect, then it does not make sense
to set it as a random effect as well? The documentation (and posts) were not
really clear on that point (not that the documentation technically 'should'
be per say, just that I got confused).

The nested hierarchy for what actually gets analyzed looks as follows:
Subj/Epoch/Stimulus/Temperature

Reasoning: there are several temperature samples recorded in each trial of
Stimulus. Several stimuli in each Epoch, and all of the Epochs for one
subject. Subject is random (theoretically) because of sampling in a
population, Epoch would be fixed because all participants went through the
same sequence of Epochs, but Stimulus varied randomly within an Epoch, which
seems inconsistent when I apply it to the lme model as both a fixed and
random effect.

Temperature ~ Stimulus-1, random=Subj|Subj/Epoch/Stimulus
Subset= Epoch=="Baseline" | Epoch=="Testing" | Epoch=="Recovery"

I'm looking to correctly allocate error terms for between subjects (Subj)
variability, and further delineate the within subject error between Epoch
and Stimulus. The current model that I got to work (memory issues namely) is
Temperature ~ Stimulus-1, random=Subj|Subj, which I decided to use to get
the residuals to have the Subject variability accounted for and subtracted.
Would a list of random structures work better? If so, is each item in the
list structured just as the random formula? I haven't actually seen/found
any examples of a list of random/nesting structures.

2- Is it possible to take a piecewise approach wrt the variance using lme(),
such as modeling the variability of each subject first, then using
further-nested terms in a model and the residuals from the previous? If so,
what caveats exist for interpreting the variances?

I'm not interpreting p-values at this point because of another issue. When I
try to set up the correlation structure, I run out of memory fast. I've
tried this on a mac G5, an HP Pavilion dv1000 (= Pentium 2.6GHz), and a
Gateway with an AMD athalon 900MHz processors. Each system has 386M memory
or more, one of which has 1G. 

3- Is there a way to get rid of the serial dependency BEFORE running the
model with LME(), such as initiating a corStruct before placing it in the
model? I'm working with so much data that I'm fine with doing the process
piecewise. An AR process was difficult because the residuals are not the
same length as the data file that I started with. Serial dependencies still
gota go, whether via the correlation term in lme() or some other method,
because I'll soon be breaking up the variance into components via
spectrum().

So I might as well add a 4th. What's the term that gets me too data after
AR() has done it's work? I'm thinking that resid() wasn't right but data
that the data differ from their original length prior to an AR process may
be how its done.

Rgds,
KeithC.
Psych Undergrad, CU Boulder
RE McNair Scholar