[R] Condition indexes and variance inflation factors

Thu Jul 24 14:24:35 CEST 2003

Thanks for all the help.

Juergen Gross supplied a program which does just what Belsley
suggested.

Chuck Cleland, John Fox and Andy Liaw all made useful programming
suggestions.

John Fox asked

<<<
(1) I've never liked this approach for a model with a constant, where
it 
makes more sense to me to centre the data. I realize that opinions
differ 
here, but it seems to me that failing to centre the data conflates 
collinearity with numerical instability.
>>>

Opinions do differ.  A few years ago, I could have given more details
(my dissertation was on this topic, but a lot of the details have
disappeared from memory); I think, though, that Belsley is looking for a
measure that deals not only with collinearity, but with several other
problems, including numerical instability (the subtitle of his later
book is Collinearity and Weak Data in Regression).  I remember being
convinced that centering was generally not a good idea, but there are
lots of people who disagree and who know a lot more statistics than I
do.

<<<
(2) I also disagree with the comment that condition indices are easier
to 
interpret than variance-inflation factors. In either case, since 
collinearity is a continuous phenomenon, cutoffs for large values are 
necessarily arbitrary.
>>>

While any cutoff is arbitrary (and Belsley advises against using a
cutoff rigidly) he does provide some evidence of how regression models
with different condition indices are affected by them.

<<<
(3) If you're interested in figuring out which variables are involved
in 
each collinear relationship, then (for centred and scaled data) you can

equivalently (and to me, more intuitively) work with the 
principal-components analysis of the predictors.
>>>

This would also work.  

<<<
(4) I have doubts about the whole enterprise. Collinearity is one
source of 
imprecision -- others are small sample size, homogeneous predictors,
and 
large error variance. Aren't the coefficient standard errors the bottom

line? If these are sufficiently small, why worry?
>>>

I think (correct me if I am wrong) that the s.e.s and the condition
indices serve very different purposes.  The condition indices are
supposed to determine if small changes in the input data could make big
differences in the results.  Belsley provides some examples where a tiny
change in the data results in completely different results (e.g.,
different standard errors, different coefficients (even reversing sign)
and so on).  

Peter

Peter L. Flom, PhD
Assistant Director, Statistics and Data Analysis Core
Center for Drug Use and HIV Research
National Development and Research Institutes
71 W. 23rd St
www.peterflom.com
New York, NY 10010
(212) 845-4485 (voice)
(917) 438-0894 (fax)