[R] Kolmogorov-Smirnov test

Mon Sep 26 22:18:54 CEST 2011

One additional point, you may want to look at the vis.test function in the TeachingDemos package for one option of comparing that focuses more on meaningful or at least visible differences.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Greg Snow
> Sent: Monday, September 26, 2011 11:45 AM
> To: rommel; r-help at r-project.org
> Subject: Re: [R] Kolmogorov-Smirnov test
> 
> There are criteria to tell if differences are meaningless, but they
> come from the science and the researcher, not from statistics tests and
> algorithms.  Consider the question: "Is one second of difference
> important?"  to answer that needs a bunch of context.  One second can
> be a large period of time in nuclear physics or the 100 yard dash, but
> a small amount of time in geology or a marathon.  Consider the
> distribution function that is equal to 1 when 0 < x < 0.99 or 99.99 < x
> < 100 and 0 otherwise, is this distribution meaningfully different from
> the uniform between 0 and 1?  In some cases yes, others probably not
> (and some distribution tests would have an easier or harder time
> finding this difference).
> 
> As for the differences in output between the programs, when the sample
> sizes are the same the KS statistic is pretty straight forward, when
> they differ there needs to be some type of interpolation of one or both
> datasets to get the comparison points.  The differences you are seeing
> are probably due to differences in how that interpolation is being
> done.  If the differences are small and do not change the decision then
> I would not worry about them.
> 
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at imail.org
> 801.408.8111
> 
> 
> > -----Original Message-----
> > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> > project.org] On Behalf Of rommel
> > Sent: Saturday, September 24, 2011 2:30 AM
> > To: r-help at r-project.org
> > Subject: Re: [R] Kolmogorov-Smirnov test
> >
> > Dear Dr. Snow,
> >  
> > Thank you for your reply.
> >  
> > 1. Are you doing the 2 sample KS test? Comparing if 2 samples come
> from
> > the same distribution? -Yes, I am doing 2-sample KS test
> >  
> > 2. With 3,000 points you will still likely have power to find
> > meaningless differences, what exactly are you trying to accomplish by
> > doing the comparison? - I am comparing the swimming parameters of
> fish
> > larvae such as move duration and move length.
> > - The comparison is between treatments.
> > -Sample sizes for example in one comparison pair :  Control
> (2700
> > data pts) vs Medium (3012 pts)
> >   Dmax = 0.07 p-level <0.001
> > - Are there criteria to know if the differences are meaningless or
> not?
> >  
> > 3. I am really only familiar with the KS test done in R (which did
> not
> > make your list, yet you are asking on an R mailing list). Differences
> > could be due to errors, different assumptions, different algorithms,
> > sunspots, or any number of other things. Are the differences
> > meaningful? R lets you see exactly what it is doing so you can check
> > errors/assumptions/algorithms, I don't know about the ones you show.
> -
> > sorry i forgot to list the R. I thought wessa.net was using R
> already.
> > but I also made the software comparisons using R. The results were:
> >     with equal data points: results are the same in
> both
> > Dmax and p-value
> >     with unequal data points : conclusions from
> > results were the same such that significant difference between
> samples
> > holds through using different softwares. Only the Dmax and p-values
> > differ a bit.
> > (please see attached file for the comparisons).
> >  
> > 4. You will need to ask someone who knows the programs you reference
> to
> > determine what input they are expecting. R expects the raw data.
> > - Thanks! I expected this also.
> >  
> > Thank you.
> >  
> > -Rommel
> >  
> >  
> >  
> >  
> > ----- UrsprÃ¼ngliche Nachricht ----- Von: "Greg Snow-2 [via R]"
> <ml-
> > node+s789695n3838250h62 at n4.nabble.com> Datum: Samstag, 24.
> September
> > 2011, 12:52 am Betreff: Re: Kolmogorov-Smirnov test An: rommel
> > <rmaneja at ifm-geomar.de>
> > Are you doing the 2 sample KS test? Comparing if 2 samples come from
> > the same distribution? With 3,000 points you will still likely have
> > power to find meaningless differences, what exactly are you trying to
> > accomplish by doing the comparison? I am really only familiar with
> the
> > KS test done in R (which did not make your list, yet you are asking
> on
> > an R mailing list).  Differences could be due to errors,
> different
> > assumptions, different algorithms, sunspots, or any number of other
> > things.  Are the differences meaningful?  R lets you see
> > exactly what it is doing so you can check
> > errors/assumptions/algorithms, I don't know about the ones you show.
> > You will need to ask someone who knows the programs you reference to
> > determine what input they are expecting.  R expects the raw
> data.
> > -----Original Message----- From: [hidden email]  [mailto: [hidden
> > email] ] On Behalf Of rommel Sent: Friday, September 23, 2011 7:51 AM
> > To: [hidden email]  Subject: Re: [R] Kolmogorov-Smirnov test Dear Dr.
> > Snow, I would like to ask for help on my three questions regarding
> > Kolmogorov Smirnov test. 1. 'With a sample size over 10,000 you will
> > have power to detect differences that are not practically meaningful.
> '
> >     -Is sample size of 3000 for each sample okay for using
> > Kolmogorov Smirnov test? 2. I am checking whether my KS procedure is
> > correct. I have compared results of KS tests using the following 3
> > softwares: 1. Statistica 2. http://www.wessa.net/rwasp_Reddy-
> > Moores%20K-S%20Test.wasp 3. http://www.physics.csbsju.edu/stats/KS-
> > test.html I have observed that the three softwares produced the same
> > results only if the samples sizes are equal. However, when samples
> are
> > not equal, I did not get similar results particularly from the
> > wessa.net calculator. Is it allowed to do a KS test to compare
> samples
> > with unequal sizes? 3. Is it allowed to use the raw data values in
> > doing KS test? Or should I use the frequencies obtained from
> frequency
> > distribution table of the raw data from each sample? I think that
> when
> > I use the frequency, the KS test will construct new cumulative
> > fractions from the frequencies, which I think is not right. Hope you
> > can assist me. Thanks! -rommel   -- View this message in
> context:
> > http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-
> > tp3479506p3836910.html Sent from the R help mailing list archive at
> > Nabble.com. ______________________________________________ [hidden
> > email]  mailing list https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html and provide commented, minimal, self-contained,
> reproducible
> > code. ______________________________________________ [hidden email]
> > mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
> > read the posting guide http://www.R-project.org/posting-guide.html
> and
> > provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> > If you reply to this email, your message will be added to the
> > discussion below: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-
> > test-tp3479506p3838250.html
> > To unsubscribe from Kolmogorov-Smirnov test, click here .
> >
> >
> > --
> > View this message in context:
> http://r.789695.n4.nabble.com/Kolmogorov-
> > Smirnov-test-tp3479506p3838937.html
> > Sent from the R help mailing list archive at Nabble.com.
> > 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.