[R] Kolmogorov Smirnov Test

Thu Nov 11 10:46:55 CET 2010

On 11-Nov-10 04:22:55, Kerry wrote:
> I'm using ks.test (mydata, dnorm) on my data.

I think your problem may lie here! If you look at the documentation
for ks.test, available with the command:
  help("ks.test")
or simply:
  ?ks.test 
you will read the following near the beginning:

Usage: ks.test(x, y, ...,
Arguments:
       x: a numeric vector of data values.
       y: either a numeric vector of data values, or a character string
          naming a cumulative distribution function or an actual
          cumulative distribution function such as 'pnorm'.

Note *cumulative* and *'pnorm'*. You say that you used 'dnorm'.
"dnorm" is R's name for the *density* function of the Normal
distribution, while the name for the *cumulative distribution*
function is "pnorm". So try the K-S test instead with

  ks.test(mydata, pnorm, ... )

where (as also stated in '?ks.test') the "..." is to be replaced
by a list of values for the parameters of the named cumulative
distribution. For example (since the parameters for pnorm are
its mean and SD):

   ks.test(mydata, pnorm, mean(mydata), sd(mydata) )

A toy example (comparing the two usages):

## First, using pnorm as above:
  Y <- rnorm(200)
  ks.test(Y,"pnorm",mean(Y),sd(Y))
  #         One-sample Kolmogorov-Smirnov test
  # data:  Y 
  # D = 0.0251, p-value = 0.9996
  # alternative hypothesis: two-sided 
## Note the nice P-value

## Next, using dnorm as you wrote:
 ks.test(Y,"dnorm",mean(Y),sd(Y))
  #         One-sample Kolmogorov-Smirnov test
  # data:  Y 
  # D = 0.9965, p-value < 2.2e-16
  # alternative hypothesis: two-sided 
## (Note the similarity to the p-values you report)!

For the deatils of 'dnorm', 'pnorm' and the like, see the help at:

   ?dnorm
or
   ?pnorm

(both lead to the same page). Granted, for a newcomer to R the
documentation (which often relies heavily on cross-referencing,
and sometimes the cross-references can be difficult to identify)
can be difficult to get to grips with. So look on this (which is
one of the easier cases) as an initiation into getting to grips
with R.

Hoping this helps,
Ted.

> I know some of my
> different variable samples (mydata1, mydata2, etc) must be normally
> distributed but the p value is always < 2.0^-16 (the 2.0 can change
> but not the exponent).
> 
> I want to test mydata against a normal distribution. What could I be
> doing wrong?
> 
> I tried instead using rnorm to create a normal distribution: y = rnorm
> (68,mean=mydata, sd=mydata), where N= the sample size from mydata.
> Then I ran the k-s: ks.test (mydata,y). Should this work?
> 
> One issue I had was that some of my data has a minimum value of 0, but
> rnorm ran as I have it above will potentially create negative numbers.
> 
> Also some of my variables will likely be better tested against non-
> normal distributions (uniform etc.), but if I figure I should learn
> how to even use ks.test first.
> 
> I used to use SPSS but am really trying to jump into R instead, but I
> find the help to assume too heavy of statistical knowledge.
> 
> I'm guessing I have a long road before I get this, so any bits of
> information that may help me get a bit further will be appreciated!
> 
> Thanks,
> kbrownk
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 11-Nov-10                                       Time: 09:46:52
------------------------------ XFMail ------------------------------