[R] nlme fixed effects specification

Sat May 5 22:14:42 CEST 2007

On 5/4/07, ivo welch <ivowel at gmail.com> wrote:
> hi doug:  yikes.  could I have done better?  Oh dear.  I tried to make
> my example clearer half-way through, but made it worse.  I meant

> set.seed(1);
> fe = as.factor( as.integer( runif(100)*10 ) ); y=rnorm(100); x=rnorm(100);
> print(summary(lm( y ~ x + fe)))
>   <deleted>
> Coefficients:
>             Estimate Std. Error t value Pr(>|t|)
> (Intercept)   0.1128     0.3680    0.31     0.76
> x             0.0232     0.0960    0.24     0.81
> fe1          -0.6628     0.5467   -1.21     0.23
>   <deleted more fe's>
> Residual standard error: 0.949 on 89 degrees of freedom
> Multiple R-Squared: 0.0838,     Adjusted R-squared: -0.0192
> F-statistic: 0.814 on 10 and 89 DF,  p-value: 0.616

> I really am interested only in this linear specification, the
> coefficient on x (0.0232) and the R^2 of 8.38% (adjusted -1.92%).  If
> I did not have so much data in my real application, I would never have
> to look at nlme or nlme4.  I really only want to be able to run this
> specification through lm with far more observations (100,000) and
> groups (10,000), and be done with my problem.

OK, I understand now.  You really do want to accomodate for the levels
of the factor as fixed effects not as random effects.  The lme and
lmer functions are fitting a more complicated model in which the
variance of the random effects is chosen to maximize the
log-likelihood or the restricted log-likelihood but they don't give
the results that are of interest to you.

As Roger indicated in another reply you should be able to obtain the
results you want by sweeping out the means of the groups from both x
and y.  However, I tried Roger's function and a modified version that
I wrote and could not show this.  I'm not sure what I am doing wrong.

I enclose a transcript that shows that I can reproduce the result from
Roger's function but it doesn't do what either of us think it should.
BTW, I realize that the estimate for the Intercept should be zero in
this case.

> now, with a few IQ points more, I would have looked at the lme
> function instead of the nlme function in library(nlme).    [then
> again, I could understand stats a lot better with a few more IQ
> points.]  I am reading the lme description now, but I still don't
> understand how to specify that I want to have dummies in my
> specification, plus the x variable, and that's it.  I think I am not
> understanding the integration of fixed and random effects in the same
> R functions.
>
> thanks for pointing me at your lme4 library.  on linux, version 2.5.0, I did
>   R CMD INSTALL matrix*.tar.gz
>   R CMD INSTALL lme4*.tar.gz
> and it installed painlessly.  (I guess R install packages don't have
> knowledge of what they rely on;  lme4 requires matrix, which the docs
> state, but having gotten this wrong, I didn't get an error.  no big
> deal.  I guess I am too used to automatic resolution of dependencies
> from linux installers these days that I did not expect this.)
>
> I now tried your specification:
>
> > library(lme4)
> Loading required package: Matrix
> Loading required package: lattice
> > lmer(y~x+(1|fe))
> Linear mixed-effects model fit by REML
> Formula: y ~ x + (1 | fe)
>  AIC BIC logLik MLdeviance REMLdeviance
>  282 290   -138        270          276
> Random effects:
>  Groups   Name        Variance       Std.Dev.
>  fe       (Intercept) 0.000000000445 0.0000211
>  Residual             0.889548532468 0.9431588
> number of obs: 100, groups: fe, 10
>
> Fixed effects:
>             Estimate Std. Error t value
> (Intercept)  -0.0188     0.0943  -0.199
> x             0.0528     0.0904   0.585
>
> Correlation of Fixed Effects:
>   (Intr)
> x -0.022
> Warning messages:
> 1: Estimated variance for factor 'fe' is effectively zero
>  in: `LMEoptimize<-`(`*tmp*`, value = list(maxIter = 200L, tolerance =
> 0.0000000149011611938477,
> 2: $ operator not defined for this S4 class, returning NULL in: x$symbolic.cor
>
> Without being a statistician, I can still determine that this is not
> the model I would like to work with.  The coefficient is 0.0528, not
> 0.0232.  (I am also not sure why I am getting these warning messages
> on my system, either, but I don't think it matters.)
>
> is there a simple way to get the equivalent specification for my smple
> model, using lmer or lme, which does not choke on huge data sets?
>
> regards,
>
> /ivo
>
-------------- next part --------------
> set.seed(1)
> y <- rnorm(100); x <- rnorm(100)
> fe <- gl(10, 10)
> head(coef(summary(lm(y ~ x + fe))))
               Estimate Std. Error     t value  Pr(>|t|)
(Intercept)  0.12203179  0.2955477  0.41290050 0.6806726
x            0.02927071  0.1053909  0.27773478 0.7818601
fe2          0.13032049  0.4176603  0.31202505 0.7557513
fe3         -0.25552441  0.4164178 -0.61362502 0.5410283
fe4          0.01123732  0.4227301  0.02658272 0.9788521
fe5          0.02836583  0.4255255  0.06666071 0.9470013
> coef(summary(lm(diffid(y, fe) ~ diffid(x, fe))))
                  Estimate Std. Error      t value  Pr(>|t|)
(Intercept)   9.802350e-18 0.08837912 1.109125e-16 1.0000000
diffid(x, fe) 2.927071e-02 0.10043495 2.914394e-01 0.7713312
> diffid1
function(h, id) {
    id <- as.factor(id)[ , drop = TRUE]
    apply(as.matrix(h), 2, function(x) x - tapply(x, id, mean)[id])
}
> coef(summary(lm(diffid1(y, fe) ~ diffid1(x, fe))))
                   Estimate Std. Error      t value  Pr(>|t|)
(Intercept)    9.802350e-18 0.08837912 1.109125e-16 1.0000000
diffid1(x, fe) 2.927071e-02 0.10043495 2.914394e-01 0.7713312