[Rd] Printing the null hypothesis

Sun Aug 16 18:30:15 CEST 2009

On 8/16/09, Ted Harding <Ted.Harding at manchester.ac.uk> wrote:
>  > Oh, I had a slightly different H0 in mind. In the given example,
>  > cor.test(..., met="kendall") would test "H0: x and y are independent",
>  > but cor.test(..., met="pearson") would test: "H0: x and y are not
>  > correlated (or `are linearly independent')" .
>
>
> Ah, now you are playing with fire! What the Pearson, Kendall and
>  Spearman coefficients in cor.test measure is *association*. OK, if
> the results clearly indicate association, then the variables are
>  not independent. But it is possible to have two variables x, y
>  which are definitely not independent (indeed one is a function of
>  the other) which yield zero association by any of these measures.
>
>  Example:
>   x <-  (-10:10) ; y <- x^2 - mean(x^2)
>   cor.test(x,y,method="pearson")
>   #       Pearson's product-moment correlation
>   # t = 0, df = 19, p-value = 1
>   # alternative hypothesis: true correlation is not equal to 0
>   # sample estimates: cor 0
>   cor.test(x,y,method="kendall")
>
>   #       Kendall's rank correlation tau
>
>   # z = 0, p-value = 1
>   # alternative hypothesis: true tau is not equal to 0
>   # sample estimates: tau 0
>   # cor.test(x,y,method="spearman")
>   #      Spearman's rank correlation rho
>   # S = 1540, p-value = 1
>   # alternative hypothesis: true rho is not equal to 0
>   # sample estimates: rho 0
>
>  If you wanted, for instance, that the "method=kendall" should
>  announce that it is testing "H0: x and y are independent" then
>  it would seriously mislead the reader!
>
I did take the null statement from the description of
Kendall::Kendall() ("Computes the Kendall rank correlation and its
p-value on a two-sided test of H0: x and y are independent."). Here,
perhaps "monotonically independent" (as opposed to "functionally
independent") would have been more appropriate.

Still, this very example seems to support my original idea: users can
easily get confused on what is the exact null of a test. Does it test
for "association" or for "no association", for "normality" or for
"lack of normality" . Printing a precise and appropriate statement of
the null would prove helpful in interpreting the results, and in
avoiding misinterpreting these.

>  > Here both "H0: x is normal" and "Ha: x is not normal" are missing. At
>  > least to beginners, these things are not always perfectly clear (even
>  > after reading the documentation), and when interpreting the results it
>  > can prove useful to have on-screen information about the null.
>
> This is possibly a more discussable point, in that even if you know
>  what the Shapiro-Wilk statistic is, it is not obvious what it is
>  sensitive to, and hence what it might be testing for. But I doubt
>  that someone would be led to try the Shapiro-Wilk test in the
>  first place unless they were aware that it was a test for normality,
>  and indeded this is announced in the first line of the response.
>  The alternative, therefore, is "non-normality".
>
To be particularly picky, as statistics is, this is not so obvious
from the print-out. For the Shapiro-Wilk test one could indeed deduce
that since it is a "test of normality", then the null tested is "H0:
data is normal". This would not hold for, say, the Pearson
correlation. In loose language, it would estimate and test for
"correlation"; in more statistically appropriate language, it will
test for "no correlation" (or for "no association"). It feels to me
that without appropriate indicators, one can easily get playing with
fire.

>  As to the contrast between absence of an "Ha" statement for the
>  Shapiro-Wilk, and its presence in cor,test(), this comes back to
>  the point I made earlier: cot.test() offers you three alternatives
>  to choose from: "two-sided" (default), "greater", "less". This
>  distinction can be important, and when cor.test() reports "Ha" it
>  tells you which one was used.
>
>  On the other hand, as far as Shapiro-Wilk is concerned there is
>  no choice of alternatives (nor of anything else except the data x).
>  So there is nothing to tell you! And, further, departure from
>  normality has so many "dimensions" that alternatives like "two
>  sided", "greater" or "less" would make no sense. One can think of
>  tests targeted at specific kinds of alternative such as "Distribution
>  is excessively skew" or "distribution has excessive kurtosis" or
>  "distribution is bimodal" or "distribution is multimodal", and so on.
>  But any of these can be detected by Shapiro-Wilk, so it is not
>  targeted at any specific alternative.
>
Thank you for these explanations. Best
Liviu