Variations on the t test

Peter Dalgaard BSA p.dalgaard@biostat.ku.dk
26 Aug 1998 00:01:45 +0200


One of the things that have been annoying me with both Splus and R is
that the "simple tests" are inconsistent with the lm/glm/dataframe
conventions, and that they become quite awkward to use when data have
to be extracted from a dataframe:

t.test(data$bp[data$sex=="F" & data$age>25], data$bp[data$sex=="M" &
data$age>25])

OK, so it's better to do 

eval(t.test(bp[sex=="F"],bp[sex=="M"]),subset(data,age>25))

but try explaining that to a class of MD's!


Below is a first sketch of a set of functions that extend t.test to
allow specification in terms of a list of vectors or a vector and a
grouping variable or a model formula, allowing data= and subset=
arguments as well. The paired test can also be specified using a
Pairs(x,y) construction, which should help prevent people from using
the wrong test. It's supposed to be downward compatible with the Splus
syntax.

I've been pushing Kurt to use a similar interface for all the ctest
functions, so I thought you'd all like to see them. If nothing else,
they should be interesting to study (they were definitely fun to
write), since they use some *really* dirty tricks relating to R's
evaluation model.

Anyway, here goes (remember, I said *first* sketch - there is at least
one bug in it, can you spot it?):

----
###
### The first couple of lines just guard against multiple loading
###

if(exists("t.test",envir=.GlobalEnv))rm(t.test)
if(exists("t.test.default",envir=.GlobalEnv))rm(t.test.default)

### Utility function to extract the environment the call is evaluated
### in. If you do eval(x,list), list is found as the variable "envir"
### in the stack frame immediately below the local frame

.ParentEnv<-function()
{
  parent<-sys.parent()
  grandparent <- sys.parent(2)
  if ( parent - grandparent == 1 )
    sys.frame(grandparent)
  else
    get("envir",envir=sys.frame(parent-1))
}

.t.test<-t.test

t.test.default<-function(x,...,group)
{
  e<-.ParentEnv()
  call <- if ( is.list(x) )
    substitute(t.test.list(x,...))
  else if (missing(group))
    substitute(.t.test(x,...))
  else 
    substitute(t.test.list(split(x, group),...))
  eval(call,e)
}

t.test<-function(...,data=sys.frame(sys.parent()),subset)
{  
  dname.add<-
  if (!missing(data)) 
    paste(", data frame:", deparse(substitute(data))) 
  else
    ""
  if (!missing(subset)){ 
    dname.add<-paste(dname.add, ", subset:", deparse(substitute(subset))) 
    subset <- eval(substitute(subset),data)
    data <- data[subset,] # had better be a data frame...
  }
  res<-eval(substitute(t.test.generic(...)), data)
  res$data.name <- paste(res$data.name, dname.add)
  res
}


t.test.generic<-function(x,...)
{
  UseMethod("t.test")
}

t.test.list<-function(l,...)
{
  if (length(l) != 2)
    stop("need exactly 2 groups")
  res<-.t.test(l[[1]],l[[2]],...)
  res$data.name<-deparse(substitute(l))
  nn<-names(l)
  if (length(nn) != 2)
    nn<-c("group 1", "group 2")
  names(res$estimate)<-nn
  res
}

t.test.formula<-function(f,...)
{
  e<-.ParentEnv()
  f[[1]]<-as.name("split")
  call<-as.call(list(as.name("t.test.list"), f, ...))
    eval(call,e)
}

Pairs<-function(x,y)
  structure(match.call(),class="paired")

t.test.paired<-function(l,...)
{
  e<-.ParentEnv()
  call<-c(as.list(l),list(...),list(paired=T))
  call[[1]]<-as.name(".t.test")
  call<-as.call(call)
  eval(call,e)
}




-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)             FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._