[R] What can I use instead of ks.test for the binomial distr

(Ted Harding) Ted.Harding at manchester.ac.uk
Sat Mar 13 22:27:28 CET 2010


For testing whether x comes from a binomial distribution,
I would suggest just using a straight chi-squared test.
I'm not aware of a version of chisq.test() in R aimed at
seeing whether data match a fitted binomial (or other
specific) distribution, but it is easy to construct one;

  x<-rbinom(10000,10,0.5)
  phat<-sum(x)/length(x)
  dhat <- dbinom((0:10),10,0.5)
  Freq <- table(x)
  Exp <- 10000*dhat
  ChiSq <- sum(((Freq-Exp)^2)/Exp); DF <- (11-1-1)
  1-pchisq(ChiSq,DF)
  # [1] 0.6788829

Note the "DF <- (11-1-1)": "(11-1)" because the frequences
must add up to 10000, so only 10 d.f. for the 11 frequencies;
and "(11-1)-1" because one parameter has been fitted.

Your second problem looks as though the question is "Do the
two samples x and x1 come from the same distribution (on 0:10)?"
(and not incorporating that it might be binomial). In that case,
probably again a simple chi-squared should be sufficient,
and this time you can use chisq.test() as it stands:

  x1<-rbinom(10000,10,0.5)
  x2<-rbinom(10000,10,0.5)
  Freq1 <- table(x1)
  Freq2 <- table(x2)
  chisq.test(cbind(Freq1,Freq2))
  #         Pearson's Chi-squared test
  # data:  cbind(Freq1, Freq2) 
  # X-squared = 8.6891, df = 10, p-value = 0.5618

However, if the question is: "Do the two samples come from
the same *binomial* distribution?", then the best approach
would be to compare the two estimates of p, assuming binomial
in each case, from sample 1 and sample 2, using either a
likelihood-ratio test or (with samples as large as the ones
you cite) simply the Normal approximation with SEs calculated
from the estimated p's.

Finally, if the question is: "Do the two samples come from
the same distribution, which is probably binomial but might
not be?", then you are getting into deeper waters! If that
is the real problem, then come back to the list about it.

Ted.

On 13-Mar-10 20:13:31, Tal Galili wrote:
> Thanks David,
> I apologize (I did search before posting, but only for
> "ks.test" and didn't came a cross references through my
> uncareful skimming)
> 
> Thanks,
> Tal
> [...]
> On Sat, Mar 13, 2010 at 10:04 PM, David Winsemius
> <dwinsemius at comcast.net>wrote:
>> On Mar 13, 2010, at 2:34 PM, Tal Galili wrote:
>>  Hello all,
>>> A friend just showed me how ks.test fails to work with
>>> pbinom for small "size".
>>> Example:
>>>
>>> x<-rbinom(10000,10,0.5)
>>> x2<-rbinom(10000,10,0.5)
>>> ks.test(x,pbinom,10,0.5)
>>> ks.test(x,pbinom,size = 10, prob= 0.5)
>>> ks.test(x,x2)
>>> 
>>> The tests gives significant p values, while the x did come from
>>> binom  with size = 10 prob = 0.5.
>>
>> The first sentence of Details in the ks.test help page:
>> "If y is numeric, a two-sample test of the null hypothesis that
>> x and y were drawn from the same _continuous_ distribution is
>> performed." (_continuous_ in italics.)
>>
>> This has come up in r-help so frequently that I nominate it for
>> addition to the FAQ. Searching with RSiteSearch() on "ks.test"
>> with "ties" or "continuous" should bring up useful commentary
>> from experts.
>> David.
>>>
>>> What test should I use instead ?
>>> Thanks,
>>> Tal
>>
>> David Winsemius, MD
>> West Hartford, CT

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 13-Mar-10                                       Time: 21:27:25
------------------------------ XFMail ------------------------------



More information about the R-help mailing list