[R] table problems

ripley@stats.ox.ac.uk ripley at stats.ox.ac.uk
Wed Jun 12 09:13:46 CEST 2002


On Wed, 12 Jun 2002, Robin Hankin wrote:

>
> dear helplist,
>
> my student has fifty trees, numbered one to fifty, and a vector
> recording which tree a certain possum slept in on 12 nights.
>
> R> c
>  [1]  3 14 17 22 26 26 17 40 43 25 46 46
> R>
>
> Thus it slept in tree #3 on Monday, then tree #14 on Tues, and so on.
> I wish to test the null hypothesis that the animal chooses trees
> randomly; try
>
> R> table(c)
> c
>  3 14 17 22 25 26 40 43 46
>  1  1  2  1  1  2  1  1  2
> >
>
> Thus it slept in tree #3 once, tree #14 once, tree #17 twice, etc.

Try tabulate(c), which goes to 46.  Or, better,

tab <- rep(0,50)
names(tab) <- 1:50
tab[names(table(c))] <- table(c)


> Now on the null hypothesis the expected number of sleeps per tree is
> 12/50=0.24; so how do I carry out a chisquare test on the data,
> including the trees that it never slept in?
>
> chisq.test() doesn't "know" that there are actually fifty distinct
> trees (most of which were never slept in) and not nine.
>
>  > chisq.test(table(c))
>
> 	Chi-squared test for given probabilities
>
> data:  table(c)
> X-squared = 1.5, df = 8, p-value = 0.9927
>
> of course this isn't right because chisquared is > 25.8 due to the
> animal sleeping in tree #17 and tree #46 twice (and of course, df
> should be 49 because I have 50 trees).

> chisq.test(tab)

        Chi-squared test for given probabilities

data:  tab
X-squared = 63, df = 49, p-value = 0.08625

Warning message:
Chi-squared approximation may be incorrect in: chisq.test(tab)


The warning is serious: the approximation is probably dreadful for data
this sparse.  In any case, is the null hypothesis plausible: the animal
independently and uniform chooses a tree each night to sleep in, from
exactly the 50 trees your student labelled?

You could easily get a more accurate significance by simulation:

doone <- function(...)
{
   c <- sample(1:50, 12, replace = T)
   tab <- rep(0,50)
   names(tab) <- 1:50
   tab[names(table(c))] <- table(c)
   chisq.test(tab)$statistic
}

> table(round(sapply(1:1000, doone),3))

    38 46.333 54.667     63 71.333 79.667     88 96.333
   232    394    229     97     34      6      7      1

(Note how discrete the distribution is.)

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list