[R] Components of variance

Douglas Bates dmbates at gmail.com
Sun Jun 26 20:40:36 CEST 2005


On 6/26/05, Spencer Graves <spencer.graves at pdf.com> wrote:
> Hi, Renaud:
> 
>           Thanks for the link to Doug Bates' recent comment on this.
> Unfortunately, the information contained therein is not enough for me to
> easily obtain all the required standard errors and write code to make
> confidence intervals.  To be more specific, I'm able to produce the same
> answers as before using lmer in place of lme as follows:
> 
> > library(lme4)
> > set.seed(2)
> > lot <- rep(LETTERS[1:3], each=9)
> > lot.e <- rep(rnorm(3), each=9)
> > wf <- paste(lot, rep(1:9, each=3))
> > wf.e <- rep(rnorm(9), each=3)
> > DF <- data.frame(lot=lot, wafer=wf,
> + Vt=lot.e+wf.e+rnorm(27))
> > (fitr <- lmer(Vt~1+(1|lot)+(1|wafer), DF))
> <snip>
> Random effects:
>   Groups   Name        Variance Std.Dev.
>   wafer    (Intercept) 0.96054  0.98007
>   lot      (Intercept) 1.51434  1.23058
>   Residual             1.34847  1.16124
> # of obs: 27, groups: wafer, 9; lot, 3
> 
> Fixed effects:
>              Estimate Std. Error DF t value Pr(>|t|)
> (Intercept)  0.60839    0.81329 26  0.7481   0.4611
> 
>           These numbers agree with what I got with lme.  I then followed your
> suggetion to "http://tolstoy.newcastle.edu.au/R/help/05/06/7291.html".
> This suggested I try str(VarCorr(fitr)):
> 
> > vcFit <- VarCorr(fitr)
> > str(vcFit)
> Formal class 'VarCorr' [package "Matrix"] with 3 slots
>    ..@ scale   : num 1.16
>    ..@ reSumry :List of 2
>    .. ..$ wafer:Formal class 'corrmatrix' [package "Matrix"] with 2 slots
>    .. .. .. ..@ .Data : num [1, 1] 1
>    .. .. .. .. ..- attr(*, "dimnames")=List of 2
>    .. .. .. .. .. ..$ : chr "(Intercept)"
>    .. .. .. .. .. ..$ : chr "(Intercept)"
>    .. .. .. ..@ stdDev: Named num 0.844
>    .. .. .. .. ..- attr(*, "names")= chr "(Intercept)"
>    .. ..$ lot  :Formal class 'corrmatrix' [package "Matrix"] with 2 slots
>    .. .. .. ..@ .Data : num [1, 1] 1
>    .. .. .. .. ..- attr(*, "dimnames")=List of 2
>    .. .. .. .. .. ..$ : chr "(Intercept)"
>    .. .. .. .. .. ..$ : chr "(Intercept)"
>    .. .. .. ..@ stdDev: Named num 1.06
>    .. .. .. .. ..- attr(*, "names")= chr "(Intercept)"
>    ..@ useScale: logi TRUE
> >
> 
>           Unfortunately, I've been so far unable to translate this into
> confidence intervals for the variance components that seem to match
> intervals(lme(...)).  The closest I came was as follows:
> 
> > 1.23058*sqrt(exp(qnorm(c(0.025, 0.975))*vcFit at reSumry$lot at stdDev))
> [1] 0.4356044 3.4763815
> > 0.98007*sqrt(exp(qnorm(c(0.025, 0.975))*vcFit at reSumry$wafer at stdDev))
> [1] 0.4286029 2.2410887
> >
>           The discrepancy between these numbers and intervals(lme(...)) is 25%
> for "lot", which suggest that I'm doing something wrong.  Moreover, I
> don't know how to get a similar interval for "scale", and I don't know
> how to access the estimated variance components other than manually.
> 
>           Comments?
>           Thanks,
>           spencer graves
> p.s.  R 2.1.0 patched under Windows XP.
> 
> Renaud Lancelot wrote:
> 
> > Spencer Graves a écrit :
> >
> >>        As far as I know, the best available is lme in library(nlme).  For
> >>more information, see the the following:
> >>
> >>        Pinheiro and Bates (2000) Mixed-Effects Models in S and S-Plus
> >>(Springer)
> >>
> >>        Consider the following example:
> >>
> >> > set.seed(2)
> >> > lot <- rep(LETTERS[1:3], each=9)
> >> > lot.e <- rep(rnorm(3), each=9)
> >> > wf <- paste(lot, rep(1:9, each=3))
> >> > wf.e <- rep(rnorm(9), each=3)
> >> > DF <- data.frame(lot=lot, wafer=wf,
> >>+ Vt=lot.e+wf.e+rnorm(27))
> >> > (fit <- lme(Vt~1, random=~1|lot/wafer, DF))
> >>Linear mixed-effects model fit by REML
> >>   Data: DF
> >>   Log-restricted-likelihood: -48.44022
> >>   Fixed: Vt ~ 1
> >>(Intercept)
> >>   0.6083933
> >>
> >>Random effects:
> >>  Formula: ~1 | lot
> >>         (Intercept)
> >>StdDev:    1.230572
> >>
> >>  Formula: ~1 | wafer %in% lot
> >>         (Intercept) Residual
> >>StdDev:   0.9801403 1.161218
> >>
> >>Number of Observations: 27
> >>Number of Groups:
> >>            lot wafer %in% lot
> >>              3              9
> >> > (CI.fit <- intervals(fit))
> >>Approximate 95% confidence intervals
> >>
> >>  Fixed effects:
> >>                 lower      est.    upper
> >>(Intercept) -1.100281 0.6083933 2.317068
> >>attr(,"label")
> >>[1] "Fixed effects:"
> >>
> >>  Random Effects:
> >>   Level: lot
> >>                     lower     est.    upper
> >>sd((Intercept)) 0.3368174 1.230572 4.495931
> >>   Level: wafer
> >>                    lower      est.    upper
> >>sd((Intercept)) 0.426171 0.9801403 2.254201
> >>
> >>  Within-group standard error:
> >>     lower      est.     upper
> >>0.8378296 1.1612179 1.6094289
> >> > str(CI.fit)
> >>List of 3
> >>  $ fixed   : num [1, 1:3] -1.100  0.608  2.317
> >>   ..- attr(*, "dimnames")=List of 2
> >>   .. ..$ : chr "(Intercept)"
> >>   .. ..$ : chr [1:3] "lower" "est." "upper"
> >>   ..- attr(*, "label")= chr "Fixed effects:"
> >>  $ reStruct:List of 2
> >>   ..$ lot  :`data.frame':       1 obs. of  3 variables:
> >>   .. ..$ lower: num 0.337
> >>   .. ..$ est. : num 1.23
> >>   .. ..$ upper: num 4.5
> >>   ..$ wafer:`data.frame':       1 obs. of  3 variables:
> >>   .. ..$ lower: num 0.426
> >>   .. ..$ est. : num 0.98
> >>   .. ..$ upper: num 2.25
> >>   ..- attr(*, "label")= chr "Random Effects:"
> >>  $ sigma   : atomic [1:3] 0.838 1.161 1.609
> >>   ..- attr(*, "label")= chr "Within-group standard error:"
> >>  - attr(*, "level")= num 0.95
> >>  - attr(*, "class")= chr "intervals.lme"
> >> > diff(log(CI.fit$sigma))
> >>    est.   upper
> >>0.32641 0.32641
> >>
> >>        The last line combined with help for intervals.lme shows that the
> >>confidence interval for sigma (and doubtless also for lot and wafer
> >>variance components is based on a normal approximation for the
> >>distribution of log(sigma).
> >>
> >>        The state of the art is reflected in "lmer" in library(lme4),
> >>described in the following:
> >>
> >>        Doug Bates (2005) "Fitting linear mixed models in R" in R News 5/1
> >>available from "www.r-project.org" -> Newsletter
> >>
> >>        However, an "intervals" function is not yet available for "lmer"
> >>objects.
> >
> >
> > The issue of variance components in lme4 was recently by D. Bates:
> >
> > http://tolstoy.newcastle.edu.au/R/help/05/06/7291.html
> >
> > Also, a colleague wrote (in French) a nice report - with data and R code
> > on how to compute variance components, their bias and distribution with
> > lme (normal approximation , Monte Carlo simulation and delta method).
> > Let me know if you want it.
> >
> > Best,
> >
> > Renaud
> >
> >
> >
> >>
> >>        spencer graves
> >>
> >>John Sorkin wrote:
> >>
> >>
> >>
> >>>Could someone identify a function that I might use to perform a
> >>>components of variance analysis? In addition to the variance
> >>>attributable to each factor, I would also like to obtain the SE of the
> >>>variances.
> >>>Thank you,
> >>>John
> >>>
> >>>John Sorkin M.D., Ph.D.
> >>>Chief, Biostatistics and Informatics
> >>>Baltimore VA Medical Center GRECC and
> >>>University of Maryland School of Medicine Claude Pepper OAIC
> >>>
> >>>University of Maryland School of Medicine
> >>>Division of Gerontology
> >>>Baltimore VA Medical Center
> >>>10 North Greene Street
> >>>GRECC (BT/18/GR)
> >>>Baltimore, MD 21201-1524
> >>>
> >>>410-605-7119
> >>>----- NOTE NEW EMAIL ADDRESS:
> >>>jsorkin at grecc.umaryland.edu

Current versions of lmer do not return a Hessian matrix from the
optimization step so it would not be easy to create such intervals.  I
do have an analytic gradient from which a Hessian could be derived by
finite differences but even that wouldn't help without documentation
on how the profiled deviance is defined and the particular
parameterization used for the variance components.  These are not
arbitrary - they are the results of a considerable amount of research
and experimentation but I have not yet written it up.  It is not
overly complicated but it is also not entirely straightforward.

However, I do not regard getting standard errors of the estimates of
variance components as being important.  In fact I think I am doing a
service to the community by *not* providing them because they are
inevitably misused.

Think back to your first course in statistics when confidence
intervals were introduced.  You learned that a confidence interval on
the mean of a population (normally distributed or reasonably close to
normally distributed) is derived as the sample mean plus or minus a
multiple of the standard error of the mean.  What about a confidence
interval on the variance of such a population?  If you calculated that
as the estimate of the variance plus or minus a multiple of the
standard error of that estimate then you probably lost points on that
part of the assignment or exam because we don't calculate a confidence
interval in that way.  We know that the distribution of S^2 is a
scaled chi-squared distribution and often very far from symmetric and
a confidence interval constructed in the manner described above would
have poor properties and would not reflect the extent of our
uncertainty about the parameter.  And we never discussed at all a
hypothesis test on whether the population variance was zero.

Now let's go to a much more complicated situation with mixed effects
models.  We know that in the simplest possible case it would be
extremely misleading to create confidence intervals or perform
hypothesis tests based on an estimate of a variance and a standard
error of that estimate but we are perfectly happy to do so in the much
more complicated situation of variance components or, even worse,
estimates of covariances.  It doesn't make sense.  Even though SAS
PROC MIXED and HLM and MLWiN and Stata and ... all produce such tests
and such confidence intervals for mixed-effects models they still
don't make sense.  There is no good justification for their use other
than a vague hand-waving about asymptotics.

Until someone can explain to me a good use of quantities such as
standard errors of estimates of variances or covariances, I'm not
going to stop working on all of the other aspects of the code and
documentation and redirect my effort to producing these.  If anyone
wants to do it they can contact me off-list and I'll explain the
optimization problem and how to get access to the parameterization and
the gradient.  If you are really energetic I can point you to the
formulae for an analytic Hessian that I haven't yet had time to
implement.




More information about the R-help mailing list