[R] Kolmogorov-Smirnov tests: overflow

Arne Mueller a.mueller at cancer.org.uk
Sun Jun 23 15:24:20 CEST 2002


Hello,

thanks evrybody for your quick response. Yes, the two distributions
should be discrete. The reason why I wanted to use ks.test was to get
very rough idea whether two distributions are different. However, both
distributions have a similar shape (but defenetely they are no normal
distributions).

However, i'm not very familiar with stats.

Below yoy write that the two datasets are so large that they'll be
significantly different anyway. Is that a general problem with large
datasets?

	regards,

	Arne

ripley at stats.ox.ac.uk wrote:
> 
> Both this and your previous post suggest that your data are from a
> discrete distribution (here as they have ties).  The standard distribution
> of the KS test is inappropriate: see the first para of `Details' in
> ?ks.test.
> 
> Even if it were not, your data sets would be so large that you would get
> statistical significance for practically insignificant differences,
> but if you really wanted to get some idea of the p value, there is
> a well-known asympototic expansion for the significance levels in terms of
> m and n.  My memory is the there is a monograph by Jim Durbin on this,
> 
> On Sun, 23 Jun 2002, Arne Mueller wrote:
> 
> > Dear All,
> >
> > I've got a problem with ks.test. I've two realy large vectors, that I'd
> > like to test, but I get an overflow, and the p-value cannot be
> > calculated:
> >
> > > length(genomesv)
> > [1] 390025
> > > length(scopv)
> > [1] 140002
> > > ks.test(genomesv, scopv)
> >
> >         Two-sample Kolmogorov-Smirnov test
> >
> > data:  genomesv and scopv
> > D = 0.2081, p-value = NA
> > alternative hypothesis: two.sided
> >
> > Warning messages:
> > 1: NAs produced by integer overflow in: n.x * n.y
> > 2: NAs produced by integer overflow in: n.x * n.y
> > 3: cannot compute correct p-values with ties in: ks.test(genomesv,
> > scopv)
> >
> > Is there anything I can do about this? I'd realy like to know what the
> > p-value is ;-)
> >
> >       thanks a lot for help,
> >
> >       Arne
> >
> > --
> > Arne Mueller
> > Biomolecular Modelling Laboratory
> > Cancer Research UK, London Research Institute
> > 44 Lincoln's Inn Fields
> > London WC2A 3PX, U.K.
> > phone1 : +44-(0)20-72693405      | fax  : +44-(0)20-75945789
> > phone2 : +44-(0)20-75945776      | mobil: +44-(0)7984601749
> > email  : a.mueller at cancer.org.uk | http://www.bmm.icnet.uk
> > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> > Send "info", "help", or "[un]subscribe"
> > (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> >
> 
> --
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272860 (secr)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-- 
Arne Mueller
Biomolecular Modelling Laboratory
Cancer Research UK, London Research Institute
44 Lincoln's Inn Fields
London WC2A 3PX, U.K.
phone1 : +44-(0)20-72693405      | fax  : +44-(0)20-75945789
phone2 : +44-(0)20-75945776      | mobil: +44-(0)7984601749
email  : a.mueller at cancer.org.uk | http://www.bmm.icnet.uk
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list