[R] multi-class classification using rpart

WeiWei Shi helprhelp at gmail.com
Tue Jan 25 21:14:37 CET 2005


Hi, Andy:
Thanks. It works after I removed the variable. I think I got a similar
problem when I used randomForest. And I am not sure if they were due
to the same reason.

Practically and Unfortunately, that variable is very important to the
accuracy. I am wondering if there is another way besides collapsing
it. BTW, I remember you mentioned some alternative implementation to
randomForest (the author provided) to avoid the upper limit (32, if I
am correct) for the level of factor which can be used in the R
version's randomForest.

Thanks for further assistance!

Ed

On Tue, 25 Jan 2005 14:58:04 -0500, Liaw, Andy <andy_liaw at merck.com> wrote:
> > From: WeiWei Shi
> >
> > Hi,
> > I am trying to make a multi-class classification tree by using rpart.
> > I used MASS package'd data: fgl to test and it works well.
> >
> > However, when I used my small-sampled data as below, the program seems
> > to take forever. I am not sure if it is due to slowness or there is
> > something wrong with my codes or data manipulation.
> >
> > Please be advised !
> >
> > The data is described as the output from str() function. The call to
> > rpart is like:
> >
> > library(rpart)
> > test_tree<-rpart(x$V142 ~ ., data=x,
> > parms=list(split='gini'), cp =0.01)
> >
> > the response variable is $V142, with 3 levels.
> >
> > Thanks for your suggestions!
> >
> > Ed.
> 
> [snip]
> 
> >  $ V141: Factor w/ 88 levels "1001","1002",..: 59 59 59 59 59
> > 59 55 78 7 73 ...
> 
> I'd bet this is the problem.  There are 2^(88-1) - 1 possible ways to split
> a factor with 88 levels.  It will work on those splits til the cows come
> home...
> 
> I'd suggest getting rid of that variable, or collapse the levels to
> something more reasonable.  The CART book describes some heuristic shortcuts
> for testing only n-1 splits for factors with n levels, but I believe that
> only works for 2-class problems, if I'm not mistaken.
> 
> Andy
> 
> ------------------------------------------------------------------------------
> Notice:  This e-mail message, together with any attachment...{{dropped}}




More information about the R-help mailing list