[R] Pb with ks.test pvalue

Christoph Buser buser at stat.math.ethz.ch
Tue Mar 22 14:32:14 CET 2005


Dear Anthony

I don't know how SAS calculates the p-value, but in R the
p-value is calculated under the assumption that the parameters
of the distribution (you want to compare with your samples) are
known and not estimated from the data.

In your example you estimate them from the data (by mean(w) and
sd(w) and therefore the p-values are not reliable. 
Somehow you fit the theoretical distribution to well to your
data (using mean and sd, estimated from the data).
Hence you are too conservative and the p.values are two large.
Maybe SAS does a correction for the estimation of the parameters
and therefore gets smaller p-values, but this is pure
speculation since I don't know the way how SAS is doing the
calculation.

I did a simulation and created 10000 samples from a normal
distribution and calculated the ks.test. I expected around 500 
significant results (on the level 0.05) by chance and got 1 or
2. 

I recommend to use graphical methods (e.g. normal plots) to
validate the normal distribution of your data instead of testing
it.  
See also ?qqnorm or ?qqplot.

Regards,

Christoph Buser

--------------------------------------------------------------
Christoph Buser <buser at stat.math.ethz.ch>
Seminar fuer Statistik, LEO C11
ETH (Federal Inst. Technology)	8092 Zurich	 SWITZERLAND
phone: x-41-1-632-5414		fax: 632-1228
http://stat.ethz.ch/~buser/
--------------------------------------------------------------



Anthony Landrevie writes:
 > 
 > Hello,
 > 
 > While doing test of normality under R and SAS, in order to prove the efficiency of R to my company, I notice
 > 
 > that Anderson Darling, Cramer Van Mises and Shapiro-Wilk tests results are quite the same under the two environnements,
 > 
 > but the Kolmogorov-smirnov p-value really is different.
 > 
 > Here is what I do:
 > 
 > > ks.test(w,pnorm,mean(w),sd(w))
 > 
 > One-sample Kolmogorov-Smirnov test
 > 
 > data: w 
 > 
 > D = 0.2143, p-value = 0.3803
 > 
 > alternative hypothesis: two.sided 
 > 
 > > w
 > 
 > [1] 3837 3334 2208 1745 2576 3208 3746 3523 3430 3480 3116 3428 2184 2383 3500 3866 3542
 > 
 > [18] 3278
 > 
 >  
 > 
 > SAS results:
 > 
 > Kolmogorov-Smirnov D 0.214278 Pr > D 0.0271
 > 
 > Why is the p-value so high under R? Much higher than with other tests.
 > 
 > Best regards,
 > 
 > Anthony Landrevie (French Student)
 > 
 > 
 > 		
 > ---------------------------------
 > 
 > 
 > 	[[alternative HTML version deleted]]
 > 
 > ______________________________________________
 > R-help at stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html




More information about the R-help mailing list