[Rd] unexpected behavior of rpart 3.1-43, loss matrix

Lars sich at gmx.de
Thu Apr 30 18:43:18 CEST 2009


I just noticed that rpart behaves unexpectecly, when performing
classification learning and specifying a loss matrix.
if the response variable y is a factor and if not all levels of the
factor  occur in the observations, rpart exits with an error:

> df=data.frame(attr=1:5,class=factor(c(2,3,1,5,3),levels=1:6))
> rpart(class~attr,df,parms=list(loss=matrix(0,6,6)))
Error in (get(paste("rpart", method, sep = ".")))(Y, offset, parms, wt)
:   Wrong length for loss matrix

note that while the levels of the factor range from 1:6, for the
concrete obseration data, only levels 1, 2, 3, 5 do occur.

the error is caused by the code of rpart.class:

 fy <- as.factor(y)
 y <- as.integer(fy)
 numclass <- max(y[!is.na(y)])

temp2 <- parms$loss
if (length(temp2) != numclass^2)
  stop("Wrong length for loss matrix")

for the example, numclass is set to 5 instead of 6.

while for that small example, it may be discussable whether or not
numclass should be 6, consider a set of data for that the response
variable has a certain range. Then, it may be the case that for some
data, not all levels of the response variable do occur. at the same
time, it is desirable to use the same loss matrix when training a
deicision tree from the data.

having said that, i am very happy with the rpart package and with its
high configurability.

best regards

More information about the R-devel mailing list