[R] Can lmer() fit a multilevel model embedded in a regression?

Mon May 22 03:45:01 CEST 2006

Andrew Gelman wrote:
> Harold,
> 
> I get confused by the terms "fixed" and "random".  Our first-level model 
> (in the simplified version we're discussing here) has 800 data points 
> (the persons in the study) and 84 predictors:  sex, age, and 82 
> coefficients for foods.  The second-level model has 82 data points (the 
> foods) and two predictors:  a constant term and folic acid concentration.
> 
> It would be hopeless to estimate the 82 food coefficients via maximum 
> likelihood, so the idea is to do a multilevel model, with a regression 
> of these coefficients on the constant term and folic acid.  The 
> group-level model has a residual variance.  If the group-level residual 
> variance is 0, it's equivalent to ignoring food, and just using total 
> folic acid as an individual predictor.  If the group-level residual 
> variance is infinity, it's equivalent to estimating the original 
> regression (with 84 predictors) using least squares.
> 
> The difficulty is that the foods aren't "groups" in the usual sense, 
> since persons are not nested within foods; rather, each person eats many 
> foods, and this is reflected in the X matrix.
> 
> Andrew

A great reference for that kind of model, plus a way to 'connect' foods 
via a composition matrix is

   author =               {Greenland, Sander},
   title =                {When should epidemiologic regressions use 
random coeff
icients?},
   journal =      Biometrics,
   year =                 2000,
   volume =               56,
   pages =                {915-921},
   annote =               {Bayesian methods;causal inference;empirical 
Bayes estimators;epidemiologic method;hierarchical regression;mixed
models;multilevel modeling;random-coefficient
regression;shrinkage;variance components;use of statistics in
epidemiology is largely primitive;stepwise variable selection on
confounders leaves important confounders uncontrolled;composition
matrix;example with far too many significant predictors with many
regression coefficients absurdly inflated when
overfit;lack of evidence for dietary effects mediated through
constituents;shrinkage instead of variable selection;larger effect on
confidence interval width than on point estimates with variable
selection;uncertainty about variance of random effects is just
uncertainty about prior opinion;estimation of variance is
pointless;instead the analysis shuld be repeated using different
values;"if one feels compelled to estimate $\tau^2$, I would recommend
giving it a proper prior concentrated amount contextually reasonable
values";claim about ordinary MLE being unbiased is misleading because
it assumes the model is correct and is the only model
entertained;shrinkage towards compositional model;"models need to be
complex to capture uncertainty about the relations...an honest
uncertainty assessment requires parameters for all effects that we
know may be present.  This advice is implicit in an antiparsimony
principle often attributed to L. J. Savage 'All models should be as
big as an elephant (see Draper, 1995)'"}

Frank

> 
> Doran, Harold wrote:
> 
> 
>>OK, I'm piecing this together a bit, sorry I'm not familiar with the 
>>article you cite. Let me try and fully understand the issue if you 
>>don't mind. Are you estimating each of the 82 foods as fixed effects? 
>>If so, in the example below this implies 84 total fixed effects (1 for 
>>each food type in the X matrix and then sex and age).
>>
>>I'm assuming that food type is nested within one of the 82 folic acid 
>>concentrations and then folic acid is treated as a random effect.
>>
>>Is this accurate?
>>
>>
>>-----Original Message-----
>>From:   Andrew Gelman [mailto:gelman at stat.columbia.edu]
>>Sent:   Sun 5/21/2006 9:17 AM
>>To:     Doran, Harold
>>Cc:     r-help at stat.math.ethz.ch; reg26 at columbia.edu
>>Subject:        Re: [R] Can lmer() fit a multilevel model embedded in 
>>a regression?
>>
>>Harold,
>>
>>I'm confused now.  Just for concretness, suppose we have 800 people, 82
>>food items, and one predictor ("folic", the folic acid concentration) at
>>the food-item level.  Then DV will be a vector of length 800, foods is
>>an 800 x 82 matrix, sex is a vector of length 800, age is a vector of
>>length 800, and folic is a vector of length 82.  The vector of folic
>>acid concentrations in individual diets is then just foods%*%folic,
>>which I can call folic_indiv.
>>
>>How would I fit the model in lmer(), then?  There's some bit of
>>understading that I'm still missing.
>>
>>Thanks.
>>Andrew
>>
>>
>>Doran, Harold wrote:
>>
>>
>>>Prof Gelman:
>>>
>>>I believe the answer is yes. It sounds as though persons are partially
>>>crossed within food items?
>>>
>>>Assuming a logit link, the syntax might follow along the lines of
>>>
>>>fm1 <- lmer(DV ~ foods + sex + age + (1|food_item), data, family = 
>>>binomial(link='logit'), method = "Laplace", control = list(usePQL=
>>>FALSE) )
>>>
>>>Maybe this gets you partly there.
>>>
>>>Harold
>>>
>>>
>>>
>>>-----Original Message-----
>>>From:   r-help-bounces at stat.math.ethz.ch on behalf of Andrew Gelman
>>>Sent:   Sat 5/20/2006 5:49 AM
>>>To:     r-help at stat.math.ethz.ch
>>>Cc:     reg26 at columbia.edu
>>>Subject:        [R] Can lmer() fit a multilevel model embedded in a
>>>regression?
>>>
>>>I would like to fit a hierarchical regression model from Witte et al.
>>>(1994; see reference below).  It's a logistic regression of a health
>>>outcome on quntities of food intake; the linear predictor has the form,
>>>X*beta + W*gamma,
>>>where X is a matrix of consumption of 82 foods (i.e., the rows of X
>>>represent people in the study, the columns represent different foods,
>>>and X_ij is the amount of food j eaten by person i); and W is a matrix
>>>of some other predictors (sex, age, ...).
>>>
>>>The second stage of the model is a regression of X on some food-level
>>>predictors.
>>>
>>>Is it possible to fit this model in (the current version of) lmer()?
>>>The challenge is that the persons are _not_ nested within food items, so
>>>it is not a simple multilevel structure.
>>>
>>>We're planning to write a Gibbs sampler and fit the model directly, but
>>>it would be convenient to be able to flt in lmer() as well to check.
>>>
>>>Andrew
>>>
>>>---
>>>
>>>Reference:
>>>
>>>Witte, J. S., Greenland, S., Hale, R. W., and Bird, C. L. (1994).
>>>Hierarchical regression analysis applied to a
>>>study of multiple dietary exposures and breast cancer.  Epidemiology 5,
>>>612-621.
>>>
>>>--
>>>Andrew Gelman
>>>Professor, Department of Statistics
>>>Professor, Department of Political Science
>>>gelman at stat.columbia.edu
>>>www.stat.columbia.edu/~gelman
>>>
>>>Statistics department office:
>>>  Social Work Bldg (Amsterdam Ave at 122 St), Room 1016
>>>  212-851-2142
>>>Political Science department office:
>>>  International Affairs Bldg (Amsterdam Ave at 118 St), Room 731
>>>  212-854-7075
>>>
>>>Mailing address:
>>>  1255 Amsterdam Ave, Room 1016
>>>  Columbia University
>>>  New York, NY 10027-5904
>>>  212-851-2142
>>>  (fax) 212-851-2164
>>>
>>>______________________________________________
>>>R-help at stat.math.ethz.ch mailing list
>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>PLEASE do read the posting guide!
>>>http://www.R-project.org/posting-guide.html
>>>
>>>
>>
>>--
>>Andrew Gelman
>>Professor, Department of Statistics
>>Professor, Department of Political Science
>>gelman at stat.columbia.edu
>>www.stat.columbia.edu/~gelman
>>
>>Statistics department office:
>>  Social Work Bldg (Amsterdam Ave at 122 St), Room 1016
>>  212-851-2142
>>Political Science department office:
>>  International Affairs Bldg (Amsterdam Ave at 118 St), Room 731
>>  212-854-7075
>>
>>Mailing address:
>>  1255 Amsterdam Ave, Room 1016
>>  Columbia University
>>  New York, NY 10027-5904
>>  212-851-2142
>>  (fax) 212-851-2164
>>
>>
>>
> 
> 

-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University