R-beta: t.test in R

MM Peterson magnus at balhaldie.u-net.com
Sun Feb 22 13:16:18 CET 1998


RE t.test in R

I objected a day or two ago to the behaviour of the one-sample t.test in R
where it is easy to generate a "confidence interval" for the mean of the
population
which does not contain the sample mean, in the case where the null
hypothesis is
rejected.  It now appears that the same behaviour is latent in the code for
the two-sample version of this test.  The relevant lines from the code for the
t.test function are reproduced below, the two lines I find objectionable in
each
being the ones where a value is assigned to tstat.
	

if (var.equal) {
                        df <- nx + ny - 2
                        v <- ((nx - 1) * vx + (ny - 1) * vy)/df
                        stderr <- sqrt(v * (1/nx + 1/ny))
                        tstat <- (mx - my - mu)/stderr
                }
                else {
                        stderrx <- sqrt(vx/nx)
                        stderry <- sqrt(vy/ny)
                        stderr <- sqrt(stderrx^2 + stderry^2)
                        df <- stderr^4/(stderrx^4/(nx - 1) + 
                                stderry^4/(ny - 1))
                        tstat <- (mx - my - mu)/stderr
                }


I say this problem is LATENT in the code, because it is very rare indeed
to apply the two-sample t-test with a proposed null-value for the difference
of the means of the populations from which the samples came different from 0.

	Nevertheless if such a case were analysed, with a straightforwardd two-sided
alternative, one would expect the confidence interval for the difference of
the
population means given as part of the output to be centred on the
difference of
the sample means observed. Instead the same anomalous behaviour is of course
apparent as in the one-sample case as the following examples show.

> x.sample <- scan()
1: 4 5 6 7 8
6: 
Read 5 items
> y.sample <- x.sample
> t.test(x.sample,y.sample,var.equal=TRUE,mu=50)

	 Two Sample t-test 

data:  x.sample and y.sample 
t = -50, df = 8, p-value = 0 
alternative hypothesis: true difference in means is not equal to 50 
95 percent confidence interval:
 -52.306 -47.694 
sample estimates:
mean of x mean of y 
        6         6 

> t.test(x.sample,y.sample,mu=50)

	 Welch Two Sample t-test 

data:  x.sample and y.sample 
t = -50, df = 8, p-value = 0 
alternative hypothesis: true difference in means is not equal to 50 
95 percent confidence interval:
 -52.306 -47.694 
sample estimates:
mean of x mean of y 
        6         6 

I guess the same simple remedy is available in these cases as for the
one-sample test, until and if changes are made in version 0.61.2.

Magnus Peterson

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list