[R] predict() an rpart() model: how to ignore missing levels in a factor

Thu Nov 18 19:40:34 CET 2010

I don't think that, considering the mechanism behind recursive 
partitioning, that there is any way for you to ignore the crop factor if 
it is not in the original test set. What decision should be made if, for 
instance, the next split in a decision tree were on crops and output was 5 
for apples, 6 for bananas, and you had an instance of jicamas? It can't 
ignore the crop factor at that point since the next decision hinges on it.

What I think you can do, however, is pre-trim your test set by testing 
whether each factor is present in the first set with something like 
(UNTESTED):

> test.set <- test.set[test.set$crop %in% original.set$crop,]
--------------------------------------
Jonathan P. Daily
Technician - USGS Leetown Science Center
11649 Leetown Road
Kearneysville WV, 25430
(304) 724-4480
"Is the room still a room when its empty? Does the room,
 the thing itself have purpose? Or do we, what's the word... imbue it."
     - Jubal Early, Firefly

r-help-bounces at r-project.org wrote on 11/18/2010 12:35:41 PM:

> [image removed] 
> 
> [R] predict() an rpart() model: how to ignore missing levels in a factor
> 
> jamessc 
> 
> to:
> 
> r-help
> 
> 11/18/2010 12:37 PM
> 
> Sent by:
> 
> r-help-bounces at r-project.org
> 
> 
> I am using an algorigm to split my data set into two random sections
> repeatedly and constuct a model using rpart() on one, test on the other 
and
> average out the results.
> 
> One of my variables is a factor(crop) where each crop type has a code. 
Some
> crop types occur infrequently or singly. when the data set is randomly
> split, it may be that the first data set has a crop type which is not
> present in the second and so using predict() I get the error:
> 
> Error in model.frame.default(Terms, newdata, na.action = na.action, xlev 
=
> attr(object,  : 
>   factor 'factor(c2001)' has new level(s) 13, 24, 35
> 
> where c2001 is the crop. I would like the predict function to ignore 
these
> records. is there a command which will allow this as part of the 
predict()
> function? With those with a small number of records (eg. 3-4), I would 
hope
> some of the models would have the right balance to allow a prediction to 
be
> made.
> -- 
> View this message in context: http://r.789695.n4.nabble.com/predict-
> 
an-rpart-model-how-to-ignore-missing-levels-in-a-factor-tp3049218p3049218.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.