[R] Kolmogorov-Smirnov-Test on binned data, I guess gumbel-distributed data
Jochen1980
info at jochen-bauer.net
Thu Nov 3 01:08:19 CET 2011
Hi R-Users,
I read some texts related to KS-tests. Most of those authors stated, that
KS-Tests are not suitable for binned data, but some of them refer to 'other'
authors who are claiming that KS-Tests are okay for binned data.
I searched for sources and can't find examples which approve that it is okay
to use KS-Tests for binned data - do you have any links to articles or
tutorials?
Anyway, I look for a test which backens me up that my data is
gumbel-distributed. I estimated the gumbel-parameters mue and beta and after
having a look on resulting plots, in my opinion: that looks quite good!
You can the plot, related data, and the rscript here:
www.jochen-bauer.net/downloads/kstest/Rplots-1000.pdf
http://www.jochen-bauer.net/downloads/kstest/rm2700-1000.txt
http://www.jochen-bauer.net/downloads/kstest/rcalc.R
The story about the data:
I am wondering what test I should choose if KS-Test is not appropriate? I
get real high p-Values for data-row-1-histogram-heights and
fitted-gumbel-distribution-function-to-bin-midth-vals. Most of the time,
KS-test results in distances of 0.01 and p-Values of 0.99 or 1. This sounds
strange to me, too high. Otherwise my plots are looking good and as you can
see, in my first experiment I sampled 1000 values. In a second experiment I
created only 50 random-values for the gumbel-parameter-estimation. I try to
reduce permutations, so I will be able to create results faster, but I have
to find out, when data fails for gumbel-distribution. The results surprised
me, I expected that my tests and plots get worse, but I got still high
p-values for the KS-Test and still a nice looking plot.
www.jochen-bauer.net/downloads/kstest/Rplots-0050.pdf
http://www.jochen-bauer.net/downloads/kstest/rm2700-0050.txt
Moreover besides the shuffled data of my randomisation-test there are
real-data-values. I calculated the p-value that my real data point occurs
under estimated gumbel distribution. Those p-values between
1000permutation-experiment and 50-permutation-experiment are correlating
enormously ... around 0.98. Pearson and Spearman-correlation-coefficients
told me this. I guess that backens up the fact, that my plots are not
getting worse nor the KS-Tests do.
I hope I was able to state my current situation and you are able to give me
some hints, for some literature or other tests or backen me up in my guess
that my data is gumbel-distributed.
Thanks in advance.
Jochen
