[R] Syntax for lme function to model random factors and interactions

Joshua Wiley jwiley.psych at gmail.com
Tue May 22 18:07:39 CEST 2012


See inline

On Mon, May 21, 2012 at 11:17 AM, i_like_macs <dkoya at mac.com> wrote:
> Hello Joshua,
>
> Many thanks for your help, especially from a fellow Bruin (I went there as
> an undergrad!).
>
> I understand that there is another forum for mixed models. If my problem
> can't be solved within this thread, I'll have to go there. I do understand
> some theory about mixed models, but obviously am far from an expert.
>
> My question is not so much statistical advice, as it concerns the correct
> syntax to include random factors and interactions (which include these
> random factors) for the lme function. Maybe it's because I'm used to SPSS,
> but I find R very difficult to use, even after looking up its built-in help.
>
> I could run your neat code suggestion:
>
> lme(Y ~ (A + B + C + D)^3, data = myData, random = ~ 1 | C, method = "ML")
>
> but would like to know how to also include "D" as a random factor. My
> understanding is that the "random" argument for the lme function is coded
> as:
>
> ~ x1 + ... + xn | g1 / ... / gm

so in this case x1 ... xn are random effects where the effects are
allowed to vary across levels of g1 .. gm.

>
> where the left side describes the model for random effects, and the right
> side describes the grouping structure. Reading other posts, I learned that I
> need both sides for the code to run without errors. However, it's not clear

that is correct, the left side is the random effect (random intercepts
and/or random slopes), the right side is whatever variable codes the
levels that the random effect can vary across.


> to me what both sides represent. The left side appears to be where the
> random factors are specified, perhaps like this:
>
> random = ~ C + D

so this would indicate that C and D are random effects, but you will
need something to indicate what they get to vary across.

>
> But then this results in errors. Does this mean I have to somehow join the
> two following lines of code to specify both random factors?
>
> random = ~ 1 | C
> random = ~ 1 | D

I do not think that is what you want (and besides I do not think that
lme() allows multiple random arguments though I could be wrong because
I work with lmer more than lme).

>
> It's not clear what the "~ 1" represents here, as I would have guessed that
> this is where the random factors would be specified. Is this related to an
> intercept-only model?

yes, ~ 1 | C  means that there is a random intercept for each level of
C.  If you do this, you will get an estimate of the average intercept
for the intercept in your model, but you will also get an estimate of
the variance in intercepts (technically the intercepts are assumed to
come from a normal distribution, you will get the mean and variance
(or standard deviation) of the maximum likelihood estimate of that
distribution).

>
> I'm sorry for sounding so lost. This is because I am. Perhaps I need to know
> more theory of mixed models, but this seems to be possible only if I

Understanding more theory would probably help.  The Pinheiro and Bates
text as wonderful as it is, may not be the easiest place to start.  I
have not seen you mention anything about a grouping or nesting
structure in your data.  This may be part of the confusion too.  Many
uses of mixed models are for that case.  A classical example would be
students nested within classrooms.  In that case, the research
question could be does number of hours spent on homework predict
grades.  The model could look like:

grades ~ 1 + homework

or in ordinary regression notation instead of R's formula:

grades = Xb + e

where X is a design matrix the first few rows of which might look like:

1  2
1  2.5
1  3
1  2

the first column being the intercept adn the second the number of
hours spent on homework for each student.  b will be a vector of
coefficients, the first coefficient being the estimated intercept and
the second the slope of grades on homework.  e is a vector of
residuals, that part of grades which cannot be explained by the
intercept and homework.  The assumption in ordinary regression is that
e is identically and independently distributed, but students are
within classrooms, and we might guess that in fact, each student was
not really an indepedent observation---there is some similarity
because they share a classroom.  Mixed models address this by adding
random effects.  Following the above example, we might do:

grades ~ 1 + homework
random = ~ 1 | ClassroomID

this allows the intercepts to randomly vary by classroom, which is
sensible---some classes may have more or less skilled students so
given that everyone did 0 hours of homework, we still might expect
some classrooms to have higher or lower grades.  This models that.
Now lets say that further, you think that the effects of homework
might vary across classrooms.  Perhaps for students in very low
performing classes, they get an enormous benefit from spending time on
homework, whereas in the very high performing classes, their grades
only marginally improve for every additional hour of homework.  You
could then write this as:

grades ~ 1 + homework
random = ~ 1 + homework | ClassroomID

Your data may not be like that, but that (or something along those
lines) is very common and probably what you will see many many
examples for.  It is not clear to me what A, B, C, and D represent for
you, so it is hard to be very specific about what you should or should
not be doing (and that is where knowing your data and the theory or
consulting with a local statistician can be very helpful).

> understand what parts of the lme function are.

I would advocate learning the theory and code hand in hand.  I do not
know of any good introductory texts that would walk through this
though teaching mixed models and R (does not mean they do not exist,
just that I do not know them).  As I said earlier, I do like the
Pinheiro and Bates book, but I would not give it to a social sciences
graduate student or someone with minimal mathematical and statistical
background.  Are there any resources at your university? (are you at a
university?)

Cheers,

Josh

>
> Thank you very much again,
>
> Daisuke
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Syntax-for-lme-function-to-model-random-factors-and-interactions-tp4630744p4630789.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/



More information about the R-help mailing list