[Rd] further notes on model.frame issue

Terry Therneau therneau at mayo.edu
Mon Jan 19 19:29:37 CET 2009

 This is a follow-up on my note of Saturday.  Let me start with two important 
    - I think this would be a nice addition, but I've had exactly one use for it 
in the 15+ years of developing the survival package.  
    - I have a work around for the current case.   
Prioritize accordingly.

The ideal would be to change survexp as follows:
    fit <- survexp( ~ gender, data=mydata, ratetable=survexp.us,
    	ratevar=list(sex=gender, year=enroll.dt, age=age*365.25))
The model statement says that I want separate curves by gender, and is similar 
to other model statements.

The ratevar option gives the mapping between my variable names and the dimnames 
of the survexp.us rate table.  It wants age in days, enrollment date to be some 
sort of date object, and sex to be a factor.  Then the heading of the R code 
would be
	m <- match.call()
	m <- m[c(1, match(names(m), c('data','formula','na.action', 'subset',
		                      'weights', 'ratevar'), nomatch=0)
        m[[1]] <- as.name('model.frame')
        m <- eval.parent(m)

That is, the variables enroll.dat and age are searched for in the data= arg. 
This is like the start opion in glm, but a more complex result than a vector.

  The model.frame function can't handle this.  (Splus fails too, same spot, less 
useful error message).  

  The current code uses 
  	fit <- survexp(~ gender + ratetable(sex=gender, year=enroll.dt, 
  	      data=mydata, ratetable=survexp.us)
The ratetable function creates a matrix with extra attributes. The matrix 
contains as.numeric of the factors with the levels remembered as an extra 
attribute, and also looks out for dates.  So the result is like ns() in the eyes 
of model.frame, and it works.  But having to write gender twice on the rhs is 
confusing to users.  

    Thanks in advance for any comments.
    	Terry Therneau

More information about the R-devel mailing list