[R] KS Test Warning Message

Christoph Buser buser at stat.math.ethz.ch
Mon Jul 10 09:35:24 CEST 2006


Dear Justin

Ties means that you have identical values in
"Year5.lm$residuals". Please remark that you can have a large
R^2, but your residuals are not normally distributed. A large
R^2 shows a strong linear relationship, but that does not say
anything about the error distribution (see example below).

So to answer your question. Yes it can take away validity of
your model if the residuals are not normally distributed,
especially tests and confidence intervals for your parameters
are based on the normal assumption.
I'd recommend to verify model assumptions by graphical tools,
such as qqplot, Tukey-Anscombe Plot, ... 
Try:

plot(Year5.lm)

The power of KS-Test is quite small and graphical tools will
give you a hint about your true error distribution instead of
giving you only a p-value that "tells you" that the errors are
not normal.

set.seed(3)
x <- 1:100
## t-distributed errors
y <- x + rt(100,2)
## Strong linear relationship
plot(x,y)

## High R^2 due to strong linear relationship
summary(reg <- lm(y~x))
## The residuals are not normal distributed
qqnorm(resid(reg))
## Small power of KS-Test. Violation of model assumption is not detected
ks.test(resid(reg), "pnorm")

Best regards,

Christoph Buser

--------------------------------------------------------------
Christoph Buser <buser at stat.math.ethz.ch>
Seminar fuer Statistik, LEO C13
ETH Zurich	8092 Zurich	 SWITZERLAND
phone: x-41-44-632-4673		fax: 632-1228
http://stat.ethz.ch/~buser/
--------------------------------------------------------------


justin rapp writes:
 > All,
 > 
 > Happy World Cup and Wimbledon.   This morning finds me with the first
 > of my many daily questions.
 > 
 > I am running a ks.test on residuals obtained from a regression model.
 > 
 > I use this code:
 > > ks.test(Year5.lm$residuals,pnorm)
 > 
 > and obtain this output
 > 	One-sample Kolmogorov-Smirnov test
 > 
 > data:  Year5.lm$residuals
 > D = 0.7196, p-value < 2.2e-16
 > alternative hypothesis: two.sided
 > 
 > Warning message:
 > cannot compute correct p-values with ties in: ks.test(Year5.lm$residuals, pnorm)
 > 
 > I am wondering if anybody can tell me what this error message means.
 > 
 > Also, could anybody clarify how I could have a regression model with a
 > high Rsquared, rouglhy .67, but with nonnormal residuals?  Does this
 > take away from the validity of my model?
 > 
 > jdr
 > 
 > ______________________________________________
 > R-help at stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



More information about the R-help mailing list