[R] Tests for Two Independent Samples

Fri Jul 31 18:54:03 CEST 2009

On 31-Jul-09 13:38:10, tedzzx wrote:
> Dear R users,
> I have got two samples: 
> sample A with observation of 223:
>    sample A has five categories: 1,2,3,4,5 (I use the numer
>    1,2,3,4,5 to define the five differen categories)
>    there are 5 observations in category 1; 81 observations in
>    category 2;110 observations in category 3; 27 observations
>    in category 4; 0 observations in category 5;
> To present the sample in R: a<-rep(1:5, c(5,81,110,27,0))
> 
> sample B with observation of 504:
>    sample B also has the same five categories: 1,2,3,4,5 
>    there are 6 observations in category 1; 127 observations in
>    category 2;297 observations in category 3; 72 observations
>    in category 4; 2 observations in category 5;
> To present the sample in R: b<-rep(1:5, c(6,127,297,72,2))
> 
> I want to test weather these two samples have significant difference
> in distribution ( or Tests for Two Independent Samples). 
> 
> I find a webside in:
> http://faculty.chass.ncsu.edu/garson/PA765/mann.htm
> 
> This page shows four nonparametric tests. Bust I can only find the test
> Kolmogorov-Smirnov Z Test. 
> res<-ks.test(a,b)
> 
> Can any one tell me which package has the other 3 tests? or Is there
> any other test for my question?
> Thanks advance
> Ted

If your "1,2,3,4,5" are simply nominal codes for the categories,
then you may be satisfied with a Fisher test or simply a chi-squared
test (using simulated P-values in view of the low frequencies in
some cells).

Taking your data:

  A<-c(5,81,110,27,0)
  B<-c(6,127,297,72,2)
  M<-cbind(A,B)
  D<-colSums(M)
  P<-M%*%(diag(1/D))
  P
  #            [,1]        [,2]
  # [1,] 0.02242152 0.011904762
  # [2,] 0.36322870 0.251984127  ## So the main differences between
  # [3,] 0.49327354 0.589285714  ## A and B are in these two categories
  # [4,] 0.12107623 0.142857143
  # [5,] 0.00000000 0.003968254

  fisher.test(M,simulate.p.value = TRUE,B=100000)
  #  Fisher's Exact Test for Count Data with simulated p-value
  #  (based on 1e+05 replicates)
  #  data:  M 
  #  p-value = 0.01594

  chisq.test(M,simulate.p.value=TRUE,B=100000)
  #  Pearson's Chi-squared test with simulated p-value
  #  (based on 1e+05 replicates)
  #  data:  M 
  #  X-squared = 11.7862, df = NA, p-value = 0.01501

So the P-values are similar in both tests.
(Another) Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 31-Jul-09                                       Time: 17:53:58
------------------------------ XFMail ------------------------------