[R] Test if data uniformly distributed (newbie)

Fri Jun 10 21:28:05 CEST 2011

Yes, punif is the function to use, however the KS test (and the others) are based on an assumption of independence, and if you know that your data points sum to 1, then they are not independent (and not uniform if there are more than 2).  Also note that these tests only rule out distributions (with a given type I error rate), but cannot confirm that the data comes from a given distribution (just that either they do, or there is not enough power to distinguish between the actual and the test distributions).

What is your ultimate question/goal?  Why do you care if the data is uniform or not?

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Kairavi Bhakta
> Sent: Friday, June 10, 2011 11:24 AM
> To: r-help at r-project.org
> Subject: [R] Test if data uniformly distributed (newbie)
> 
> Hello,
> 
> I have a bunch of files containing 300 data points each with values
> from 0
> to 1 which also sum to 1 (I don't think  the last element is relevant
> though). In addition, each data point is annotated as an "a" or a "b".
> 
> I would like to know in which files (if any) the data is uniformly
> distributed.
> 
> I used Google and found out that a Kolmogorov-Smirnov or a Chi-square
> goodness-of-fit test could be used. Then I looked up ?kolmogorov and
> found
> "ks.test", but the example there is for the normal distribution and I
> am not
> sure how to adapt it for the uniform distribution. I did ?runif and
> read
> about the uniform distribution but it doesn't say what the "cumulative
> distribution" is. Is it "punif", like "pnorm"? I thought of that
> because I
> found a message on this list where someone was told to use "pnorm"
> instead
> of "dnorm". But the help page on the uniform distribution says punif is
> the
> "distribution function". Are the "cumulative distribution" and the
> "distribution function" the same thing? Having several names for the
> same
> thing has always confused me very much in statistics.
> 
> Also, I am not sure whether I need to specify any parameters for the
> distribution and which. I thought maybe I should specify "min=0" and
> "max=1"
> but those appear to be the defaults. Do I need to specify q, the vector
> of
> quantiles?
> 
> So is
> > ks.test(x, punif)
> correct or not for what I am attempting to do?
> 
> After this I will also need to find out whether the a's and b's are
> distributed randomly in each file. I would be greatful for any pointers
> although I have not researched this issue yet.
> 
> Kairavi.
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.