[R] Loss of numerical precision from conversion to list ?

Thu Jul 20 23:09:19 CEST 2006

I´m working on an R-implementation of the simulation-based finite-sample null-distribution of (R)LR-Test in Mixed Models (i.e. testing for Var(RandomEffect)=0) derived by C. M. Crainiceanu and D. Ruppert.

I'm in the beginning stages of this project and while comparing quick and dirty grid-search-methods and more exact optim()/optimize()-based methods to find the maximum of a part of the RLR-Test-Statistic i stumbled upon the following problem:

It seems to me that R produces different results depending on whether originally identical numbers involved in the exact same computations are read from a matrix or a list.
(I need both in order to do quick vectorized computation for the grid-search with matrices and "list-based" computation so that i can put the function to be maximized in something like mapply(...,optim(foo),...)- I can elaborate if desired)

However, the problem goes away once a number involved in the computation is set from almost zero (e-15) to 4.
I'm completely mystified by this; especially since this number that I change is NOT one of the numbers that are switched from matrix to list.

Here's the code:

library(nlme)
data(Orthodont)    #108 dental measurements on 27 subjects
# m1<-lme(distance~age,random=~1|Subject,data=Orthodont)
# summary(m1)
# ...
# Random effects:
# Formula: ~1 | Subject
#          (Intercept) Residual
# StdDev:    2.114724 1.431592  -> lambda.REML=2.114^2/1.431^2 = 2.182382

#DesignMatrix for fixed Effects
X<-cbind(rep(1,108),Orthodont$age) 
#DesignMatrix of RandomEffects
Z<-matrix(data=c(rep(1,4),rep(0,108)),nrow=108,ncol=27) 

#Corr(RanEf)^0.5 = 27 x 27 Identity, since RandomIntercepts are independent
sqrt.Sigma<-diag(27) 

K<-27 #number of subjects/ random intercepts
n<-nrow(X)
p<-ncol(X)
lambda0 <- 2.182382 #actually not a sensible choice as Null-Hypothesis, but that doesn't pertain to the problem

#Projection-Matrix for Fixed-Effects-Model: Y -> errors
P0=diag(n)-X%*%solve((t(X)%*%X))%*%t(X) 

mu<-eigen(sqrt.Sigma%*%t(Z)%*%P0%*%Z%*%sqrt.Sigma)$values
# mu
# [1] 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00
#[11] 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00
#[21] 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 4.00000e+00 5.77316e-15
# ! Notice the last (27th) value very close to 0

nsim<-10
set.seed(10)
#nsim x K array of ChiSq(1)-variates
w.k.sq.mat<-matrix(rchisq(nsim*K,1),nrow=nsim) 
#nsim x 1 array of ChiSq(n-p-K)-variates
w.sum2<-rchisq(nsim,n-p-K)                     

### vectorized computation of nsim=10 realizations
### of a part of the RLR-statistic under the Null:
w.k.sq<- cbind(w.k.sq.mat,w.sum2)       #nsim x (K+1)
#vector-based results:
num.v<-  rowSums(((lambda-lambda0)*mu*w.k.sq[,-(K+1)])/(1+lambda*mu))
den.v<-  rowSums(((1+lambda0*mu)*w.k.sq[,-(K+1)]) / (1+lambda*mu)) + w.k.sq[,K+1]

### list-based computation of nsim=10 realizations
### of a part of the RLR-statistic under the Null:
w.k.sq<-list()
length(w.k.sq)<-nsim
#put the nsim rows into list-slots:
for(i in 1:nsim) w.k.sq[[i]]<-c(w.k.sq.mat[i,],w.sum2[i]) 
num.l<-numeric(0)
den.l<-numeric(0)
for(i in 1:nsim)
{
num.l[i]<-sum(((lambda-lambda0)*mu*w.k.sq[[i]][-(K+1)])/(1+lambda*mu))
#exactly analogous to num.v & den.v, except list-elements instead of vector
den.l[i]<-sum(((1+lambda0*mu)*w.k.sq[[i]][-(K+1)]) / (1+lambda*mu)) + w.k.sq[[i]][K+1]
}

#  Now the actual problem:
#  notice the discrepancies between the results from vectorized computation
#  and the results from list-based computation
#  Since discrepancies disappear if mu[27] is changed 
#  from 5.77316e-15 to 4, i'm guessing somewhere in the conversion to
#  "list" there must be a loss of precision or is there an entirely 
#  different problem?

num.l
# [1] -25.93322 -17.65486 -18.80239 -19.49974 ....
num.v
# [1] -23.84733 -17.62233 -27.22975 -19.50294 ....

den.l
# [1] 117.30246  92.59041  92.91491 112.90113 ...
den.v
# [1] 115.21657  92.55789 101.34228 112.90433 ...

#now i set
mu[27]<-4
#and reran the computation of num.l /.v and den.l /.v from above:

num.l
# [1] -26.25565 -17.67423 -27.47259 -20.97961 ...
num.v
# [1] -26.25565 -17.67423 -27.47259 -20.97961 ...
den.l
# [1] 117.62489  92.60979 101.58511 114.38100 ...
den.v
# [1] 117.62489  92.60979 101.58511 114.38100 ...

what i would like to know now is:

1) which of the two calculations yields a more precise result?
or rather:
2) how can i avoid these discrepancies in the future since i need to be able to compare these two methods? 
and, most importantly,
3) what in R.A.Fisher's name is happening here?

version information:

Version 2.3.1 (2006-06-01) 
i386-pc-mingw32 
.Machine$double.eps is 2.220446e-16 (does it matter?)

thanks for your time,

-- 
Fabian Scheipl

f.abian at gmx.net

"Feel free" – 10 GB Mailbox, 100 FreeSMS/Monat ...