[R] wrapper for coxph with a subset argument

Erik Iverson iverson at biostat.wisc.edu
Fri Nov 9 18:04:52 CET 2007


Dear R-help -

Thanks to those who replied yesterday (Christos H. and Thomas L.) 
regarding my question on coxph and model formula, the answers worked 
perfectly.

My new question involves the following.

I want to run several coxph models (package survival) with the same 
dataset, but different subsets of that dataset.

I have found a way to do this, described below in functions subwrap1 and 
subwrap2.  These do not use the coxph "subset" argument, however, as you 
will see.

My three main questions are :

1) When writing a wrapper like this, should I be using the subset 
argument in coxph(), or alternatively, doing what I am doing in subwrap1 
and subwrap2 below?  Is the subset argument in coxph more of a 
convenience tool for interactive use rather than programs?

2) If the approach in subwrap1 and subwrap2 is fine, is there a 
preference for using 'expressions' or 'strings'?  Eventually, my program 
will create these subset conditions programmatically, so I think strings 
will be the way I have to go, even though I've seen warnings on this 
list about using the eval(parse()) construct.

3) Is there some approach to do this that I'm overlooking?  My goal will 
be to produce a list of subset conditions (probably a character vector), 
and then use lapply to run the various cox regressions.

I can already achieve my goal, I just would like to know more details 
about how others do things like this.

I've simplified my code below to focus on where I feel I'm confused. 
Here is some code along with comments:

#### BEGIN R SAMPLE CODE

#Function for producing test data
makeTestDF <- function(n) {
   times  <- sample(1:200, n, replace = TRUE)
   event  <- rbinom(n, 1, prob = .1)
   trt    <- rep(c("A","B"), each = n/2)
   sex    <- factor(c("M","F"))
   sex    <- rep(sex, times = n/2)
   testdf <- data.frame(times,event,trt,sex)
}

# Make test data, n = 200
testdf <- makeTestDF(200)

# Cox wrapper function with subset, this one works
# Takes subset as expression
subwrap1 <- function(x, sb) {
   sb <- eval(substitute(sb), x)
   x <- x[sb,]
   coxph(Surv(times,event)~trt, data = x)
}

subwrap1(testdf, sex == 'F')

# This next one also works, but uses a character variable
# instead of an expression as the subset argument

subwrap2 <- function(x, sb) {
   sb <- eval(parse(text = sb), x)
   x <- x[sb,]
   coxph(Surv(times,event)~trt, data = x)
}

subwrap2(testdf, "sex == 'F'")

# Neither of the above use the coxph subset argument
# If I try using that, I get stuck with expressions,
# I've tried many
# different things in the subset argument, but none
# seem to do the trick.  Is using this argument in a
# program even advisable?

subwrap3 <- function(x, sb) {
   coxph(Surv(times,event)~trt, data = x,
   subset = eval(substitute(sb), x))
}

subwrap3(testdf, sex == 'F') #does not work

# Using a string, this works, however.

subwrap4 <- function(x, sb) {
   coxph(Surv(times,event)~trt, data = x, subset = eval(parse(text=sb)))
}

subwrap4(testdf, "sex == 'F'")

### END R SAMPLE CODE

Thanks so much,
Erik Iverson
iverson at biostat.wisc.edu

 > sessionInfo()
R version 2.5.1 (2007-06-27)
i686-pc-linux-gnu

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] "grDevices" "datasets"  "tcltk"     "splines"   "graphics"  "utils"
[7] "stats"     "methods"   "base"

other attached packages:
        debug     mvbutils SPLOTS_1.2-6        Hmisc        chron 
survival
      "1.1.0"      "1.1.1"      "1.2-6"      "3.4-2"     "2.3-13" 
"2.32"



More information about the R-help mailing list