[R] Fisher's Exact Test

(Ted Harding) Ted.Harding at nessie.mcc.ac.uk
Mon Feb 15 12:58:01 CET 1999


On 15-Feb-99 Simon Fear wrote:
> Though not immediately an R question, I wanted to comment on Peter's
> 
>>Mind you, there's a paper by Yates lying somewhere in my "must read
>>some time" stack where he argues that the 2 * p procedure is more
>>correct...
> 
> which becomes an R question if your think that there is such a thing as
> Kurt's package being "correct". Try this definition of a P value: the
> probability, under the null hypothesis, of observing data as, or more,
> extreme than that actually observed.
[snip]
> The idea of a P value is to quantify to what extent the data are
> "consistent" with the null which begs the question of what we mean by
> "consistent" (that's according to Cox and Hinkley, anyway). I reckon if
> one procedure gave you "significance" and another didn't, you wouldn't
> have strongly established the result under anybody's definitions.

No doubt this list is not where the Meaning Of Statistics should be
discussed, but I'd like to comment -- with the excuse that I shall veer
the comment towards R anyway.

1. Choice of test statistic is in principle arbitrary in the first
instance. Once chosen, it defines a "measure of discrepancy" (or
"consistency" if you will) between data and hypothesis. The
"P-value", in this context, is simply a standardised universal
measurement scale into which the value of the statistic can be mapped.

2. If a well-defined alternative hypothesis is available, usually
this can serve to determine a good statistic, since the alternative
highlights the "direction" in which you should measure discrepancy
(and often implies *how* you should measure it).

3. In the absence of (2), choice of statistic as in (1) implies a
class of alternatives such that hypotheses in this class have higher
likelihood than the Null for data which are discrepant according to
the chosen test statistic. The emergence of this class of alternatives,
as a result of the test procedure, can suggest directions for future
research.

So it's all up for grabs, in general; the logical framework implied by
the above is flexible and elastic and allows scope for tailoring the test
to the needs of the application. This is the point at which you need to
think about reality. One test may give "significance", another not. If
reality has suggested an appropriate test -- as in (2) above -- that is
the one whose outcome you should favour. Otherwise, the difference
between the outcomes can tell you something about the reality you don't
yet understand well enough -- see (3) above.

The implications for any general-purpose package (such as R) are that
ideally the package should be able to support the statistician in
whatever [s]he decides is appropriate, for exploration or for final
testing, in the context of the research being carried out.

In particular this means that the investigation should not be driven by
the limitations of software. R and its ilk possess the basic mechanisms,
accessible to the user, to allow infinite flexibility of use. Certain
distinguished users, such as Peter Dalgaard, develop procedures on top of
these mechanisms which are of great use to the rest of us. The more
options they are willing to build in to these procedures, the more scope
the rest of us will have when we use them (and we'll be very grateful for
it too). So I'm emphatically *pro* flexible options, and *anti*
restriction of choice on grounds of spurious "correctness".

Simon's reference to Cox & Hinckley is very much to the point. See e.g.
Ch3, p. 65: "It is ... necessary to have also some idea of the type of
departure from the null hypothesis which it is required to test. All sets
of data are uniquely extreme in some respects and without some idea of
what are meaningful departures from H_0 the problem of testing
consistenecy with it is meaningless."

Best wishes to all,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Date: 15-Feb-99                                       Time: 11:58:01
------------------------------ XFMail ------------------------------
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list