[R] rpart with interval censored data crashes R

Keith Jewell k.jewell at campden.co.uk
Fri Jan 9 15:04:14 CET 2009


Hi Everyone,

This example code results in R 'crashing'; that is the R application closes 
with no warnings or error messages.
#-----------------------
myD <- read.table(stdin(), header=TRUE, nrows=20)
Broth Salt   pH Temp    N  Y Growth
1        310    9.0 2.92      10 90.0 NA            0
2        615    6.0 7.82      30  1.0  2            1
3        217    2.0 7.34      10  7.0  8            1
4        338   10.0 4.44      10 90.0 NA            0
5        240    4.0 7.33      10 20.0 21            1
6        336   10.0 3.90      10 90.0 NA            0
7        279    7.0 6.73      10 90.0 NA            0
8       1021    9.0 5.03      45  8.0  9            1
9        974    7.0 4.01      45 90.0 NA            0
10       265    7.0 2.93      10 90.0 NA            0
11       934    4.0 5.28      45  0.1  1            1
12       669    9.0 5.03      30 90.0 NA            0
13       875   10.0 6.24      37  1.0  2            1
14       385    2.0 5.84      20  1.0  2            1
15       562    2.0 5.84      30  0.1  1            1
16       718    0.5 5.54      37  0.1  1            1
17       845    9.0 5.03      37  3.0  6            1
18       913    2.0 5.84      45  0.1  1            1
19       577    4.0 4.10      30 90.0 NA            0
20        20    0.5 7.44       8 24.0 27            1

library(rpart)
library(survival)
fit<-rpart(Surv(N,Y,type="interval2")~Salt+pH+Temp, data=myD)
#---------------------

Professor Ripley helpfully pointed out that the documentation does not say 
that interval censoring is supported, and indeed this seems only to happen 
with interval censored data.

?rpart indicates that the dependent variable may be a survival object. 
Neither ?rpart nor "An Introduction to Recursive Partitioning Using the 
RPART Routines" (Therneau et al 1997) suggest that the dependent variable 
may contain interval censored data, but neither do they suggest it 
shouldn't; i.e. as far as I'm aware (!) this restriction is not documented.

This post has three purposes:

1) Bring this behaviour - especially the crash in response to 'bad' data - 
to the attention of the authors.

2) Seek an explanation of the restriction (if intentional). In my 
simplicity, it seems that interval censored data should be easier to handle 
than left or right censored - after all the information content is greater.

3) Seek guidance on how to work around the problem. I'm minded to replace 
the interval censored data by the mid points of the intervals. Does anyone 
have any comments on such an approach?

Any comments gratefully received.

Keith Jewell
==========================================
Version:
 platform = i386-pc-mingw32
 arch = i386
 os = mingw32
 system = i386, mingw32
 status = Patched
 major = 2
 minor = 8.1
 year = 2009
 month = 01
 day = 07
 svn rev = 47502
 language = R
 version.string = R version 2.8.1 Patched (2009-01-07 r47502)

Windows Server 2003 x64 (build 3790) Service Pack 2

Locale:
LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United 
Kingdom.1252;LC_MONETARY=English_United 
Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252

Search Path:
 .GlobalEnv, package:stats, package:graphics, package:grDevices, 
package:utils, package:datasets, package:methods, Autoloads, package:base




More information about the R-help mailing list