[R] Significance of interaction depends on factor reference level - lmer/AIC model averaging

Sun Jul 1 23:57:21 CEST 2012

Andy Robertson <ar313 <at> exeter.ac.uk> writes:

> I am using lmer combined with AIC model selection and averaging (in the
> MuMIn package) to try and assess how isotope values (which indicate diet)
> vary within a population of animals.
> 
> I have multiple measures from individuals (variable 'Tattoo') and multiple
> individuals within social groups within 4 locations (A, B, C ,D) crucially I
> am interested if there are differences between sexes and age classes
> (variable AGECAT2) and whether this differs with location.
> 
> However, whether or not I get a significant sex:location interaction depends
> on which location is my reference level and I cannot understand why this is
> the case. It seems to be due to the fact that the standard error associated
> with my interactions varies depending on which level is the reference.
> 
> Any help or advice would be appreciated,
> 
> Andrew Robertson

  This is all a little overwhelming.  I appreciate that you
are trying to be thorough, but there's an awful lot to look at here ...
I will give comments until the point where I ran out of time.
> 
> Below is the example code of what I am doing and an example of the model
> summary and model averaging results with location A as the ref level or
> location B.
> 
> if A is the reference level...
> 
> #full model
> 
> Amodel<-lmer(d15N~(AGECAT2+Sex+Location1+AGECAT2:Location1+Sex:Location1+AGE
> CAT2:Sex+(1|Year)+(1|Location1/Socialgroup/Tattoo)), REML=FALSE,
> data=nocubs)

  Note that you have Location in your model twice, once as a fixed
effect and once as a random effect.  This is bound to lead to trouble.
If you use (1|Location1:Socialgroup) and (1|Location1:Socialgroup:Tattoo)
you will get the random effects you want without also incorporating
a random effect of Location1.

  You could specify the fixed effects as
(AGECAT2+Sex+Location1)^2 if you wanted (it would be equivalent
to this specification).

> 
> #standardise model
> Amodels<-standardize(Amodel, standardize.y=FALSE)

  is this from the 'rockchalk' package?  Do you know that it
isn't doing something funny?

> #dredge models
> summary(model.avg(get.models(Adredge,cumsum(weight)<0.95)))
> 
> Then the average model coefficients indicate no sex by location interaction
>  
> Component models:
>       df  logLik    AICc Delta Weight
> 235   13 -765.33 1557.28  0.00   0.68
> 1235  15 -764.55 1559.91  2.63   0.18
> 3      9 -771.64 1561.57  4.29   0.08
> 12345 17 -763.67 1562.37  5.09   0.05
> 
> Term codes:
>         AGECAT2           c.Sex       Location1   AGECAT2:c.Sex
> c.Sex:Location1 
>               1               2               3               4
> 5 
> 

  What is c.Sex?  "centered sex" (e.g -1 for males and +1 for females?

  In general I think it is a bad idea to model-average sets of models
some of which contain interactions, because (unless the design is
perfectly balanced and the contrasts are set to sum-to-zero contrasts),
the meaning of the main effects changes between models.  In a model
with an interaction (assuming sum-to-zero contrasts), the main effect
represents the average effect across groups using equal weights:
for example the main effect of sex would be the mean of the
male and female predictions.  In the model without an interaction,
the main effect of groups will represent the average across groups
weighting by the number of individuals per group ...

> Model-averaged coefficients: 
>                        Estimate Std. Error z value Pr(>|z|)    
> (Intercept)            8.673592   0.474524  18.279   <2e-16 ***
> c.Sex                  0.095375   0.452065   0.211    0.833    
> Location1B            -3.972882   0.556575   7.138   <2e-16 ***
> Location1C            -3.633331   0.531858   6.831   <2e-16 ***
> Location1D            -3.348665   0.539143   6.211   <2e-16 ***
> c.Sex:Location1B      -0.372653   0.513492   0.726    0.468    
> c.Sex:Location1C       0.428299   0.511254   0.838    0.402    
> c.Sex:Location1D      -0.757582   0.512586   1.478    0.139    
> AGECAT2OLD            -0.179772   0.150842   1.192    0.233    
> AGECAT2YEARLING       -0.009596   0.132328   0.073    0.942    
> AGECAT2OLD:c.Sex       0.045963   0.296471   0.155    0.877    
> AGECAT2YEARLING:c.Sex -0.323985   0.268919   1.205    0.228    

 In general you should not test terms involving categorical variables
(e.g. sex:location) by looking at all of the individual parameter
z-values, but by comparing models with and without the term.
This gets harder when you are doing model averaging. In general
I would say that model averaging and information-theoretic approaches
in general are best for *prediction*, while good old-fashioned
frequentist approaches are best for *hypothesis testing*, which
seems to be what you are trying to do ...

 Also note that the summary is giving you the results of Z-tests,
which do not take the finite size of the data set into account.

> And the full model summary looks like this..
> 
> Linear mixed model fit by maximum likelihood 
> 
> Formula: d15N ~ (AGECAT2 + Sex + Location1 + AGECAT2:Location1 +
> Sex:Location1 +      AGECAT2:Sex + (1 | Year) + (1 |
> Location1/Socialgroup/Tattoo)) 
> 
>    Data: nocubs 
> 
>   AIC  BIC logLik deviance REMLdev
> 
> 1568 1670 -761.1     1522    1534
> 
> Random effects:
> 
> Groups                         Name        Variance Std.Dev.
> 
> Tattoo:(Socialgroup:Location1) (Intercept) 0.35500  0.59582 
>  Socialgroup:Location1          (Intercept) 0.35620  0.59682 
>  Location1                      (Intercept) 0.00000  0.00000 
>  Year                           (Intercept) 0.00000  0.00000 
>  Residual                                   0.49584  0.70416 

  Note here that you're getting zero variances for the location
and year variances, and almost identical variances for the
other two random effects (which looks a little fishy to me,
but I can't quite say that it's wrong). 

> Number of obs: 608, groups: Tattoo:(Socialgroup:Location1), 132;
> Socialgroup:Location1, 22; Location1, 4; Year, 2

   Trying to fit a 4-level or even more extremely a 2-level factor
as a random effect is almost guaranteed to give you zero variance
estimates.  I would strongly consider fitting Location and Year
as fixed effects (you can still include social group within 
location and individual within social group as random effects).
(See point above about how to exclude Location from the random
effects.)

> Fixed effects:
>                            Estimate Std. Error t value
> (Intercept)                 8.83179    0.52961  16.676
> AGECAT2OLD                 -0.44101    0.41081  -1.074
> AGECAT2YEARLING             0.01805    0.38698   0.047
> SexMale                    -0.11346    0.51239  -0.221
> Location1B                 -3.97880    0.63063  -6.309
> Location1C                 -4.04816    0.60404  -6.702
> Location1D                 -3.36389    0.63304  -5.314
> AGECAT2OLD:Location1B       0.44198    0.54751   0.807
> AGECAT2YEARLING:Location1B -0.22134    0.52784  -0.419
> AGECAT2OLD:Location1C       0.20684    0.50157   0.412
> AGECAT2YEARLING:Location1C  0.24132    0.47770   0.505
> AGECAT2OLD:Location1D       0.53653    0.52778   1.017
> AGECAT2YEARLING:Location1D  0.51755    0.51038   1.014
> SexMale:Location1B         -0.02442    0.57546  -0.042
> SexMale:Location1C          0.74680    0.58128   1.285
> SexMale:Location1D         -0.41800    0.59505  -0.702
> AGECAT2OLD:SexMale         -0.08907    0.32513  -0.274
> AGECAT2YEARLING:SexMale    -0.40146    0.30409  -1.320
> 
> If location B is the reference level then the average model coefficients
> indicate an age by sex interaction in location C.

  ???  Do you mean an effect of sex in location C? I don't see where
the interaction with age comes in ...

 Also note that you seem to have changed from "c.Sex" (a continuous
variable, according to the model summary) to "Sex" (a factor with
"Female" as the first level and "Male" as the second).  Is that
responsible for the differences you are seeing?

> Component models:
>       df  logLik    AICc Delta Weight
> 235   13 -765.33 1557.28  0.00   0.68
> 1235  15 -764.55 1559.91  2.63   0.18
> 3      9 -771.64 1561.57  4.29   0.08
> 12345 17 -763.67 1562.37  5.09   0.05
> 
> Term codes:
>         AGECAT2           c.Sex       Location2   AGECAT2:c.Sex
> c.Sex:Location2 
>               1               2               3               4
> 5 
> 
> Model-averaged coefficients: 
>                        Estimate Std. Error z value Pr(>|z|)    
> (Intercept)            4.700710   0.294275  15.974   <2e-16 ***
> c.Sex                 -0.277278   0.248093   1.118   0.2637    
> Location2A             3.972882   0.556575   7.138   <2e-16 ***
> Location2C             0.339551   0.379873   0.894   0.3714    
> Location2D             0.624217   0.390063   1.600   0.1095    
> c.Sex:Location2A       0.372653   0.513492   0.726   0.4680    
> c.Sex:Location2C       0.800952   0.345898   2.316   0.0206 *  
> c.Sex:Location2D      -0.384929   0.346832   1.110   0.2671    
> AGECAT2OLD            -0.179772   0.150842   1.192   0.2333    
> AGECAT2YEARLING       -0.009596   0.132328   0.073   0.9422    
> AGECAT2OLD:c.Sex       0.045963   0.296471   0.155   0.8768    
> AGECAT2YEARLING:c.Sex -0.323985   0.268919   1.205   0.2283    
> 

  stopped here ...

  In general it's not surprising that the apparent effect measured
in the way you have parameterized and are measuring it changes
with parameterization.  The parameters mean different things and
are using a different baseline ...

  A lot of this is basic (although not easy) stuff about
parameterization.