[R] Which Durbin-Watson is correct? (weights involved) - using durbinWatsonTest and dwtest (packages car and lmtest)

Fri Aug 12 20:42:25 CEST 2011

On Fri, 12 Aug 2011, Dimitri Liakhovitski wrote:

> Hello!
>
> I have a data frame mysample (sorry for a long way of creating it
> below - but I need it in this form, and it works). I regress Y onto X1
> through X11 - first without weights, then with weights:
>
> regtest1<-lm(Y~., data=mysample[-13]))
> regtest2<-lm(Y~., data=mysample[-13]),weights=mysample$weight)
> summary(regtest1)
> summary(regtest2)
>
> Then I calculate Durbin-Watson for both regressions using 2 different packages:
>
> library(car)
> library(lmtest)
>
> durbinWatsonTest(regtest1)[2]
> dwtest(regtest1)$stat
>
> durbinWatsonTest(regtest2)[2]
> dwtest(regtest2)$stat
>
> When there are no weights, the Durbin-Watson statistic is the same.
> But when there are weights, 2 packages give Durbin-Watson different
> statistics. Anyone knows why?

The result of dwtest() is wrong. Internally, dwtest() extracts the model 
matrix and response (but no weights) and does all processing based on 
these. Thus, it computes the DW statistic for regtest1 not regtest2.

I've just added a patch to my source code which catches this problem and 
throws a meaningful error message. It will be part of the next release 
(0.9-29) in due course.

Of course, this doesn't help you with computing the DW statistic for the 
weighted regression but hopefully it reduces the confusion about the 
different behaviors...
Z

> Also, it's interesting that both of them are also different from what
> SPSS spits out...
>
> Thank you!
> Dimitri
>
>
> ############################################
> ### Run the whole code below to create mysample:
>
> intercor<-0.3 	# intercorrelation among all predictors
> k<-10 		# number of predictors
> sigma<-matrix(intercor,nrow=k,ncol=k) # matrix of intercorrelations
> among predictors
> diag(sigma)<-1
>
> require(mvtnorm)
> set.seed(123)
> mypop<-as.data.frame(rmvnorm(n=100000, mean=rep(0,k), sigma=sigma,
> method="chol"))
> names(mypop)<-paste("x",1:k,sep="")
> set.seed(123)
> mypop$x11<-sample(c(0,1),100000,replace=T)
>
> set.seed(123)
> betas<-round(abs(rnorm(k+1)),2) # desired betas
> Y<-as.matrix(mypop) %*% betas
> mypop<-cbind(mypop, Y)
> rSQR<-.5
> VARofY<- mean(apply(as.data.frame(mypop$Y),2,function(x){x^2})) -
> mean(mypop$Y)^2
> mypop$Y<-mypop$Y + rnorm(100000, mean=0, sd=sqrt(VARofY/rSQR-VARofY))
>
> n<-200
> set.seed(123)
> cases.for.sample<-sample(100000,n,replace=F)
> mysample<-mypop[cases.for.sample,]
> mysample<-cbind(mysample[k+2],mysample[1:(k+1)])  #dim(sample)
> weight<-rep(1:10,20);weight<-weight[order(weight)]
> mysample$weight<-weight
>
>
>
> -- 
> Dimitri Liakhovitski
> marketfusionanalytics.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>