[R] gamma distribution

Christoph Buser buser at stat.math.ethz.ch
Thu Jul 28 11:19:06 CEST 2005


As Uwe mentioned be careful about the difference the
significance level alpha and the power of a test.

To do power calculations you should specify and alternative
hypothesis H_A, e.g. if you have two populations you want to
compare and we assume that they are normal distributed (equal
unknown variance for simplicity). We are interested if there is
a difference in the mean and want to use the t.test.
Our Null hypothesis H_0: there is no difference in the means

To do a power calculation for our test, we first have to specify
and alternative H_A: the mean difference is 1 (unit)
Now for a fix number of observations we can calculate the power
of our test, which is in that case the probability that (if the
true unknown difference is 1, meaning that H_A is true) our test
is significant, meaning if I repeat the test many times (always
taking samples with mean difference of 1), the number of
significant test divided by the total number of tests is an
estimate for the power.

In you case the situation is a little bit more complicated. You
need to specify an alternative hypothesis.
In one of your first examples you draw samples from two gamma
distributions with different shape parameter and the same
scale. But by varying the shape parameter the two distributions
not only differ in their mean but also in their form.
I got an email from Prof. Ripley in which he explained in
details and very precise some examples of tests and what they
are testing. It was in addition to the first posts about t tests
and wilcoxon test. 
I attached the email below and recommend to read it carefully. It
might be helpful for you, too.


Christoph Buser

Christoph Buser <buser at stat.math.ethz.ch>
Seminar fuer Statistik, LEO C13
ETH (Federal Inst. Technology)	8092 Zurich	 SWITZERLAND
phone: x-41-44-632-4673		fax: 632-1228

From: Prof Brian Ripley <ripley at stats.ox.ac.uk>
To: Christoph Buser <buser at stat.math.ethz.ch>
cc: "Liaw, Andy" <andy_liaw at merck.com>
Subject: Re: [R] Alternatives to t-tests (was Code Verification)
Date: Thu, 21 Jul 2005 10:33:28 +0100 (BST)

I believe there is a rather more to this than Christoph's account.  The 
Wilcoxon test is not testing the same null hypothesis as the t-test, and 
that may very well matter in practice and it does in the example given.

The (default in R) Welch t-test tests a difference in means between two 
samples, not necessarily of the same variance or shape.  A difference in 
means is simple to understand, and is unambiguously defined at least if 
the distributions have means, even for real-life long-tailed 
distributions.  Inference from the t-test is quite accurate even a long 
way from normality and from equality of the shapes of the two 
distributions, except in very small sample sizes.  (I point my beginning 
students at the simulation study in `The Statistical Sleuth' by Ramsey and 
Schafer, stressing that the unequal-variance t-test ought to be the 
default choice as it is in R.  So I get them to redo the simulations.)

The Wilcoxon test tests a shift in location between two samples from 
distributions of the same shape differing only by location.  Having the 
same shape is part of the null hypothesis, and so is an assumption that 
needs to be verified if you want to conclude there is a difference in 
location (e.g. in means).  Even if you assume symmetric distributions (so 
the location is unambiguously defined) the level of the test depends on 
the shapes, tending to reject equality of location in the presence of 
difference of shape.  So you really are testing equality of distribution, 
both location and shape, with power concentrated on location-shift 

Given samples from a gamma(shape=2) and gamma(shape=20) distributions, we 
know what the t-test is testing (equality of means).  What is the Wilcoxon 
test testing?  Something hard to describe and less interesting, I believe.

BTW, I don't see the value of the gamma simulation as this 
simultaneously changes mean and shape between the samples.  How about
checking holding the mean the same:

n <- 1000
z1 <- z2 <- numeric(n)
for (i in 1:n) {
   x <- rgamma(40, 2.5, 0.1)
   y <- rgamma(40, 10, 0.1*10/2.5)
   z1[i] <- t.test(x, y)$p.value
   z2[i] <- wilcox.test(x, y)$p.value
## Level
1 - sum(z1>0.05)/1000  ## 0.049
1 - sum(z2>0.05)/1000  ## 0.15

? -- the Wilcoxon test is shown to be a poor test of equality of means. 
Christoph's simulation shows that it is able to use difference in shape as 
well as location in the test of these two distributions, whereas the 
t-test is designed only to use the difference in means.  Why compare the 
power of two tests testing different null hypotheses?

I would say a very good reason to use a t-test is if you are actually 
interested in the hypothesis it tests ....

pantd at unlv.nevada.edu writes:
 > thanks for your response. btw i am calculating the power of the wilcoxon test. i
 > divide the total no. of rejections by the no. of simulations. so for 1000
 > simulations, at 0.05 level of significance if the no. of rejections are 50 then
 > the power will be 50/1000 = 0.05. thats y im importing in excel the p values.
 > is my approach correct??
 > thanks n regards
 > -dev

More information about the R-help mailing list