[R] general question on binomial test / sign test

Thu Sep 2 23:59:27 CEST 2010

On 02-Sep-10 18:01:55, Greg Snow wrote:
> Just to add to Ted's addition to my response.  I think you are moving
> towards better understanding (and your misunderstandings are common),
> but to further clarify:
> [Wise words about P(A|B), P(B|A), P-values, etc., snipped]
> 
> The real tricky bit about hypothesis testing is that we compute a
> single p-value, a single observation from a distribution, and based on
> that try to decide if the process that produced that observation is a
> uniform distribution or something else (that may be close to a uniform
> or very different).

Indeed. And this is precisely why I began my original reply as follows:

>> Zitat von Ted.Harding at manchester.ac.uk:
>>> [...]
>>> The general logic of a singificance test is that a test statistic
>>> (say T) is chosen such that large values represent a discrepancy
>>> between possible data and the hypothesis under test. When you
>>> have the data, T evaluates to a value (say t0). The null hypothesis
>>> (NH) implies a distribution for the statistic T if the NH is true.
>>>
>>> Then the value of Prob(T >= t0 | NH) can be calculated. If this is
>>> small, then the probability of obtaining data at least as discrepant
>>> as the data you did obtain is small; if sufficiently small, then
>>> the conjunction of NH and your data (as assessed by the statistic T)
>>> is so unlikely that you can decide to not believe that it is
>>> possible.
>>> If you so decide, then you reject the NH because the data are so
>>> discrepant that you can't believe it!

The point is that the test statistic T represents *discrepancy*
between data and NH in some sense. In what sense? That depends on
what you are interested in finding out; and, whatever it is,
there will be some T that represents it.

It might be whether two samples come from distributions with equal
means, or not. Then you might use T = mean(Ysample) - mean(Xsample).
Large values of |T| represent discrepancy (in either direction)
between data and an NH that the true means are equal. Large values
of T, discrepancy in the positive direction, large values of -T
diuscrepancy in the negative direction. Or it might be whether or
not the two samples are drawn from populations with equal variances,
when you might use T = var(Ysample)/var(Xsample). Or it might be
whether the distribution from which X was sampled is symmetric,
in which case you might use skewness(Xsample). Or you might be
interested in whether the numbers falling into disjoint classes
are consistent with hypothetical probabilities p1,...,pk of
falling into these classes -- in which case you might use the
chi-squared statistic T = sum(((ni - N*pi)^2)/(N*pi)). And so on.

Once you have decided on what "discrepant" means, and chosen a
statistic T to represent discrepancy, then the NH implies a
distribution for T and you can calculate
  P-value = Prob(T >= t0 | NH)
where t0 is the value of T calculated from the data.

*THEN* small P-value is in direct correspondence with large T,
i.e. small P is equivalent to large discrepancy. And it is also
the direct measure of how likely you were to get so large a
discrepancy if the NH really was true.

Thus the P-values, calculated from the distribution of (T | NH),
are ordered, not just numerically from small P to large, but also
equivalently by discrepancy (from large discrepancy to small).

Thus the uniform distribution of P under the NH does not just
mean that any value of P is as likely as any other, so "So what?
Why prefer on P-value to another?"

We also have that different parts of the [0,1] P-scale have
different *meanings* -- the parts near 0 are highly discrepant
from NH, the parts near 1 are highly consistent with NH,
*with respect to the meaning of "discrepancy" implied by the
choice of test statistic T*.

So it helps to understand hypothesis testing if you keep in
mind what the test statistic T *represents* in real terms.

Greg's point about "try to decide if the process that produced that
observation is a uniform distribution or something else (that may
be close to a uniform or very different)" is not in the first instance
relevant to the direct interpretation of small P-value as large
discrepancy -- that involves only the Null Hypothesis NH, under
which the P-values have a uniform distribution.

Where it somes into its own is that an Alternative Hypothesis AH
would correspond to some degree of discrepancy of a certain kind,
and if T is well chosen then its distribution under AH will give
large values of T greater probability than they would get under NH.
Thus the AHs that are implied by a large value of a certain test
statistic T are those AHs that give such values of T greater
probability than they would get under NH. Thus we are now getting
into the domain of the Power of the test to detect discrepancy.

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 02-Sep-10                                       Time: 22:59:23
------------------------------ XFMail ------------------------------