[R] Kolmogorov-Smirnov tests: overflow

ripley@stats.ox.ac.uk ripley at stats.ox.ac.uk
Sun Jun 23 15:37:03 CEST 2002


On Sun, 23 Jun 2002, Arne Mueller wrote:

> Hello,
>
> thanks evrybody for your quick response. Yes, the two distributions
> should be discrete. The reason why I wanted to use ks.test was to get
> very rough idea whether two distributions are different. However, both
> distributions have a similar shape (but defenetely they are no normal
> distributions).
>
> However, i'm not very familiar with stats.
>
> Below yoy write that the two datasets are so large that they'll be
> significantly different anyway. Is that a general problem with large
> datasets?

That's not what I said, but what I did say is a standard problem in large
datasets.

>
> 	regards,
>
> 	Arne
>
> ripley at stats.ox.ac.uk wrote:
> >
> > Both this and your previous post suggest that your data are from a
> > discrete distribution (here as they have ties).  The standard distribution
> > of the KS test is inappropriate: see the first para of `Details' in
> > ?ks.test.
> >
> > Even if it were not, your data sets would be so large that you would get
> > statistical significance for practically insignificant differences,
> > but if you really wanted to get some idea of the p value, there is
> > a well-known asympototic expansion for the significance levels in terms of
> > m and n.  My memory is the there is a monograph by Jim Durbin on this,
> >
> > On Sun, 23 Jun 2002, Arne Mueller wrote:
> >
> > > Dear All,
> > >
> > > I've got a problem with ks.test. I've two realy large vectors, that I'd
> > > like to test, but I get an overflow, and the p-value cannot be
> > > calculated:
> > >
> > > > length(genomesv)
> > > [1] 390025
> > > > length(scopv)
> > > [1] 140002
> > > > ks.test(genomesv, scopv)
> > >
> > >         Two-sample Kolmogorov-Smirnov test
> > >
> > > data:  genomesv and scopv
> > > D = 0.2081, p-value = NA
> > > alternative hypothesis: two.sided
> > >
> > > Warning messages:
> > > 1: NAs produced by integer overflow in: n.x * n.y
> > > 2: NAs produced by integer overflow in: n.x * n.y
> > > 3: cannot compute correct p-values with ties in: ks.test(genomesv,
> > > scopv)
> > >
> > > Is there anything I can do about this? I'd realy like to know what the
> > > p-value is ;-)
> > >
> > >       thanks a lot for help,
> > >
> > >       Arne
> > >
> > > --
> > > Arne Mueller
> > > Biomolecular Modelling Laboratory
> > > Cancer Research UK, London Research Institute
> > > 44 Lincoln's Inn Fields
> > > London WC2A 3PX, U.K.
> > > phone1 : +44-(0)20-72693405      | fax  : +44-(0)20-75945789
> > > phone2 : +44-(0)20-75945776      | mobil: +44-(0)7984601749
> > > email  : a.mueller at cancer.org.uk | http://www.bmm.icnet.uk
> > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> > > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> > > Send "info", "help", or "[un]subscribe"
> > > (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> > > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> > >
> >
> > --
> > Brian D. Ripley,                  ripley at stats.ox.ac.uk
> > Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> > University of Oxford,             Tel:  +44 1865 272861 (self)
> > 1 South Parks Road,                     +44 1865 272860 (secr)
> > Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>
> --
> Arne Mueller
> Biomolecular Modelling Laboratory
> Cancer Research UK, London Research Institute
> 44 Lincoln's Inn Fields
> London WC2A 3PX, U.K.
> phone1 : +44-(0)20-72693405      | fax  : +44-(0)20-75945789
> phone2 : +44-(0)20-75945776      | mobil: +44-(0)7984601749
> email  : a.mueller at cancer.org.uk | http://www.bmm.icnet.uk
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list