[R] problem with generalized least square, and lm.gls() bug

Thu Jun 21 20:06:22 CEST 2001

Naoki Takebayashi <ntakebay at bio.indiana.edu> writes:

> I'm very confused about generalized least square method (GLS), so
> please help me!  I'm trying to use GLS with R to estimate an intercept and
> slope; vector, b = (intercept, slope).
> 
> My understanding of GLS is like this:
> y = X %*% b + e, where Var(e) is symmetric dispersion matrix
> (covariance-variance matrix); V (which is theoretically known in my
> case).  We can decompose V into t(R) %*% R, where R is an upper
> triangular, square-root matrix of V, and t(R) is tranpose of R.  Then
> if we call T = Inverse(t(R)), we can multply both side of equation to
> make the problem as a ordinary least square.
> 
> T %*% y = T %*% X %*% b + T * e
> 
> To find T, "solve(t(chol(V))" should do it.  However, when I looked at
> lm.gls() from MASS library, it uses other method of obtaining T.  From
> my understanding, when I give V (my dispersion matrix) to the argument
> "W=" of lm.gls, I should also give "inverse = TRUE" (can anyone
> confirm that I should be using this option?).  Then it uses the
> following method to obtain T.
> 
> eW <- eigen(V, TRUE)
> T <- diag(eW$values ^ (-0.5))  %*% t(eW$vector)
> 
> I was checking if these two methods give same answer with a small
> matrix; so I tried this:
> 
> > tempSqRt <- matrix(c(7,0,0, 19, 76,  0, 31, 94, 51), c(3,3))
> > temp <- t(tempSqRt) %*% tempSqRt  # this is a symmetric dispersion
> matrix
> > temp.e <- eigen(temp, TRUE)
> > temp.T <- diag(temp.e$values ^ (-0.5)) %*% t(temp.e$vector)
> > temp.T
>              [,1]          [,2]         [,3]
> [1,] 0.0001090669  0.0042107823  0.006247486
> [2,] 0.0004108533 -0.0272660081  0.018370016
> [3,] 0.1487442294  0.0003382315 -0.002824703
> 
> > solve(t(chol(temp)))
>             [,1]          [,2]          [,3]
> [1,]  0.14285714 -5.569319e-18  3.101497e-17
> [2,] -0.03571429  1.315789e-02 -6.217046e-18
> [3,] -0.02100840 -2.425181e-02  1.960784e-02
> 
> So these two methods do NOT give the same answer.  I guess I'm
> missunderstanding something.  Could someone point out what's wrong?

The T matrix is not uniquely defined. Any matrix for which
crossprod(T) = solve(temp) will do. It doesn't have to be the Choleski
triangular factor. (And since R/S aren't good at using the upper-lower
diagonal matrix properties, any speed advantage of a Choleski-based
method is lost anyway.)

Check:

> solve(temp)
              [,1]          [,2]          [,3]
[1,]  2.212503e-02  3.956691e-05 -0.0004119295
[2,]  3.956691e-05  7.612803e-04 -0.0004755256
[3,] -4.119295e-04 -4.755256e-04  0.0003844675
> crossprod(temp.T)
              [,1]          [,2]          [,3]
[1,]  2.212503e-02  3.956691e-05 -0.0004119295
[2,]  3.956691e-05  7.612803e-04 -0.0004755256
[3,] -4.119295e-04 -4.755256e-04  0.0003844675
> temp.C <- solve(t(chol(temp)))
> crossprod(temp.C)
              [,1]          [,2]          [,3]
[1,]  2.212503e-02  3.956691e-05 -0.0004119295
[2,]  3.956691e-05  7.612803e-04 -0.0004755256
[3,] -4.119295e-04 -4.755256e-04  0.0003844675

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._