[R] t-test behavior given that the null hypothesis is true

(Ted Harding) Ted.Harding at wlandres.net
Wed Jan 9 15:51:08 CET 2013


Ah! You have aqssigned a parameter "equal.var=TRUE", and "equal.var"
is not a listed paramater for t.test() -- see ?t.test :

  t.test(x, y = NULL,
    alternative = c("two.sided", "less", "greater"),
    mu = 0, paired = FALSE, var.equal = FALSE,
    conf.level = 0.95, ...)

Try it instead with "var.equal=TRUE", i.e. in your code:
  for(i in 1:k){
    rv.t.pvalues[i] <- t.test(rv[i, 1:(c/2)], rv[i, (c/2+1):c],
  ##equal.var=TRUE, alternative="two.sided")$p.value
    var.equal=TRUE, alternative="two.sided")$p.value
  }

When I run your code with "equal.var", I indeed repeatedly see
the deficient bin for the lowest P-values that you observed.
When I run your code with "var.equal" I do not see it.

The explanation is that, since "equal.var" is not a recognised
parameter for t.test(), it has assumed the default value FALSE
for var.equal, and has therefore (since it is a 2-sample test)
adopted the Welch/Satterthwaite procedure:

  var.equal: a logical variable indicating whether to treat
    the two variances as being equal. If 'TRUE' then the
    pooled variance is used to estimate the variance
    otherwise the Welch (or Satterthwaite) approximation
    to the degrees of freedom is used.

This has the effect of somewhat adapting the test procedure to
the data, so that extreme (i.e. small) values of P are even
rarer than they should be.

With best wishes,
Ted.

On 09-Jan-2013 13:24:59 Pavlos Pavlidis wrote:
> Hi Ted,
> thanks for the reply. I use a similar code which you can see below:
> 
> k <- 10000
> c <- 6
> rv <- array(NA, dim=c(k, c) )
> for(i in 1:k){
>   rv[i,] <- rnorm(c, mean=0, sd=1)
> }
> 
> rv.t.pvalues <- array(NA, k)
> 
> for(i in 1:k){
>   rv.t.pvalues[i] <- t.test(rv[i, 1:(c/2)], rv[i, (c/2+1):c],
> equal.var=TRUE, alternative="two.sided")$p.value
> }
> 
> hist(rv.t.pvalues)
> 
> The histogram is this one:
> *http://tinyurl.com/histogram-rt-pvalues-pdf
> 
> *
> *all the best
> idaios
> *
> 
> 
> On Wed, Jan 9, 2013 at 12:29 PM, Ted Harding <Ted.Harding at wlandres.net>wrote:
> 
>> On 09-Jan-2013 08:50:46 Pavlos Pavlidis wrote:
>> > Dear all,
>> > I observer a strange behavior of the pvalues of the t-test under
>> > the null hypothesis. Specifically, I obtain 2 samples of 3
>> > individuals each from a normal distribution of mean 0 and variance 1.
>> > Then, I calculate the pvalue using the t-test (var.equal=TRUE,
>> > samples are independent). When I make a histogram of pvalues
>> > I see that consistently the bin of the smallest pvalues has a
>> > lower frequency. Is this a known behavior of the t-test or it's
>> > a kind of bug/random number generation problem?
>> >
>> > kind regards,
>> > idaios
>>
>> Using the following code, I did not observe the behavious you describe.
>> The histograms are consistent with a uniform distribution of the
>> P-values, and the lowest bin for the P-values (when the code is
>> run repeatedly) is not consistently lower (or higher, or anything
>> else) than the other bins.
>>
>> ## My code:
>> N <- 10000
>> Ps <- numeric(N)
>> for(i in (1:N)){
>>   X1 <- rnorm(3,0,1) ; X2 <- rnorm(3,0,1)
>>   Ps[i] <- t.test(X1,X2,var.equal=TRUE)$p.value
>> }
>> hist(Ps)
>> ################################################
>>
>> If you would post the code you used, the reason why you are observing
>> this may become more evident!
>>
>> Hoping this helps,
>> Ted.
>>
>> -------------------------------------------------
>> E-Mail: (Ted Harding) <Ted.Harding at wlandres.net>
>> Date: 09-Jan-2013  Time: 10:29:21
>> This message was sent by XFMail
>> -------------------------------------------------
>>
> 
>       [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at wlandres.net>
Date: 09-Jan-2013  Time: 14:51:04
This message was sent by XFMail




More information about the R-help mailing list