[R] predict() question

Weiwei Shi helprhelp at gmail.com
Tue May 17 20:09:33 CEST 2005


Hi, there:
Following yesterday's question ( i had a new level for a categorical
variable occurred in validation dataset and predict() complains about
it: i made some python code to solve the problem), but here, I am just
curious about some details about the mechanism:

I believed rpart follows CART and for a categorical variable, the
splitting criteria should be like,
is it A or not?
   --yes, go to left branch
   --no, go to right

So, when you predict, if you have a new level C,for example,
the predict() should not complain about the occurrence of "C" (of
course, if there are many "C"'s in validation, it should complain).
Maybe for robustness, predict() has to check first if there is new
level or not.

I am not sure if my understanding is right or not, please be advised!

Thanks,

-- 
Weiwei Shi, Ph.D

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III




More information about the R-help mailing list