[R] Why is rpart() so slow?

Richard A. O'Keefe ok at cs.otago.ac.nz
Fri Mar 19 05:15:06 CET 2004


I asked why rpart is slow.
Patrick Connolly <p.connolly at hortresearch.co.nz> replied:
	You could give us an indication of just what you're trying to
	do, with what, and to what, so we would be in a position to say what
	improvements could be made.
	
The thing that is chugging away now is
	rpart(rgrp ~ y2 + sex, a.frame, a.frame$wt)

where

    rgrp has 21 levels
    y2 has 561 levels
    sex has 2 levels
    wt has values 1..9
    a.frame has 50,500 cases and other variables

I have written decision tree builders, in fact I've published a paper on
the technique, and I really would expect this to zip through in seconds.
instead of the 4 hours this one has taken so far today (500MHz machine).

Presuambly it's something to do with trying to do binary splits and find
good subsets, but I don't *want* binary splits, and I can't figure out from
?rpart how to tell rpart that I don't want binary splits.

(The idea of trying to find an optimal partition of a set of 561 elements
does not fill me with enthusiasm.)

Is there perhaps an alternative to rpart that does n-way splits instead of
binary splits?




More information about the R-help mailing list