[R] lmer4 and variable selection

Bert Gunter gunter.berton at gene.com
Mon Aug 25 18:55:20 CEST 2008


You **really** should work with a local statistician. Remote statistical
advice (this is not really about R) from even well-meaning helpers
unfamiliar with your work is really very risky. For example, I would suggest
making all sorts of plots (statistical summaries alone are wholly inadequate
and potentially quite misleading), but exactly what to plot, how to
interpret what the plots show, and what to do next would depend on both the
subject matter background (how the study was conducted and what sorts of
mechanisms are expected, for example)and what the plots revealed.

Like the gangster movies (used to) say: just a friendly warning ...  :)

-- Bert Gunter
Genentech


----- Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Andreas Nord
Sent: Monday, August 25, 2008 9:22 AM
To: r-help at r-project.org
Subject: [R] lmer4 and variable selection


Dear list, 

I am currently working with a rather large data set on body temperature
regulation in wintering birds. My original model contains quite a few
dependent variables, but I do not (of course) wish to keep them all in my
final model. I've fitted the following model to the data:

>
temp.lme1<-lmer(T.B~tarsus+wing+weight+factor(age)+factor(sex)+fat+minsunset
+day1oct+day1oct.2+minnight+ave.day+minnight.1+T.A+ave.night.1+(1|ID)+(1|sig
n),data=bodytemp.df)

where T.B equals body temperature; explanatories are a number of biometric
measures (tarsus,  wing, weight, fat, age, sex) and various measures of
ambient temperature (ave.day, minnight.1, minnight,  ave.night.1, T.A) and
time/date (minsunset,day1oct,day1oct.2). Random factors are ID (individuals
were samples ranging from 1 to 3 times) and sign (person performing
measurements; 2 levels).

Model output looks like this:

> summary(temp.lme1)
Linear mixed model fit by REML 
Formula: T.B ~ tarsus + wing + weight + factor(age) + factor(sex) + fat +

minsunset + day1oct + day1oct.2 + minnight + ave.day + minnight.1 +      T.A
+ ave.night.1 + (1 | ID) + (1 | sign) 
   Data: bodytemp.df 
   AIC BIC logLik deviance REMLdev
 557.8 614 -260.9      441   521.8
Random effects:
 Groups   Name        Variance   Std.Dev.  
 ID       (Intercept) 1.0399e-01 0.32247096
 sign     (Intercept) 6.2663e-08 0.00025033
 Residual             8.0162e-01 0.89533134
Number of obs: 167, groups: ID, 124; sign, 2

Fixed effects:
                 Estimate Std. Error t value
(Intercept)     4.124e+01  4.104e+00  10.049
tarsus         -5.925e-02  5.801e-02  -1.021
wing           -6.252e-02  4.984e-02  -1.254
weight          1.499e-01  1.446e-01   1.037
factor(age)2K+  1.981e-01  1.651e-01   1.200
factor(sex)M    9.232e-02  2.146e-01   0.430
fat            -2.297e-02  8.150e-02  -0.282
minsunset      -1.104e-03  1.043e-03  -1.058
day1oct        -4.247e-03  2.879e-02  -0.148
day1oct.2       5.087e-05  1.560e-04   0.326
minnight       -5.987e-02  7.022e-02  -0.853
ave.day         1.128e-01  1.582e-01   0.713
minnight.1     -9.590e-02  1.684e-01  -0.570
T.A            -4.855e-02  5.185e-02  -0.936
ave.night.1     1.420e-01  2.477e-01   0.573

Correlation of Fixed Effects:
            (Intr) tarsus wing   weight f()2K+ fct()M fat    mnsnst day1ct
dy1c.2 mnnght ave.dy mnng.1 T.A   
tarsus      -0.851

wing        -0.870  0.966

weight       0.071 -0.417 -0.411

factr(g)2K+  0.211 -0.248 -0.241  0.219

factor(sx)M  0.573 -0.499 -0.526 -0.179  0.105

fat         -0.037  0.046  0.052 -0.264 -0.152  0.045

minsunset   -0.177 -0.144 -0.122  0.214 -0.101 -0.027 -0.045

day1oct     -0.261 -0.051 -0.052 -0.117 -0.145  0.140  0.131  0.515

day1oct.2    0.257  0.050  0.051  0.121  0.141 -0.149 -0.125 -0.484 -0.993

minnight    -0.074  0.249  0.216 -0.271 -0.032 -0.043  0.022  0.022 -0.168 
0.231                            
ave.day     -0.025  0.070  0.050  0.001  0.045 -0.022  0.046 -0.363 -0.120 
0.041 -0.415                     
minnight.1   0.304 -0.081 -0.045  0.069  0.129  0.012 -0.054 -0.349 -0.636 
0.644  0.023  0.052              
T.A          0.049 -0.043  0.018  0.130  0.040 -0.164 -0.065 -0.317 -0.288 
0.249 -0.598  0.267  0.143       
ave.night.1 -0.234  0.004 -0.015 -0.030 -0.110  0.016  0.031  0.493  0.614
-0.586  0.105 -0.524 -0.863 -0.243

At this point, I want to go on selecting the variables with most explanatory
power to come up with a final model. However, I'm not sure on how to do
this, because (not being a trained statistician) I'm used to having p-values
to guide me. Similarly, I would like to be able to report the relative
"importance" of  variables in some way but, as apparent from a number of
threads, p-values seem to be the least preferred option when it comes to
lmer. I've read about the mcmcsamp()-function, but I'm not entirely sure on
how to use it or on how to intrepret the output. 

Any advice would be most appreciated.


Kind regards, 
Andreas Nord                   

-- 
View this message in context:
http://www.nabble.com/lmer4-and-variable-selection-tp19146850p19146850.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list