[R] error in random forest

Nagu thogiti at gmail.com
Sat Mar 8 02:27:19 CET 2008


Thank you very much. I'll jump in to the data and verify the
consistency between the training and testing variables and their
levels.

On Fri, Mar 7, 2008 at 5:14 PM,  <Bill.Venables at csiro.au> wrote:
> The error message is pretty clear, really.  To spell it out a bit more,
>  what you have done is as follows.
>
>  Your training set has factor variables in it.  Suppose one of them is
>  "f".  In the training set it has 5 levels, say.
>
>  Your test set also has a factor "f", as it must, but it appears that in
>  the test set it has 6 levels, or more, or levels that do not agree with
>  those for "f" in the training set.
>
>  This mismatch measn that the predict method for randomForest cannot use
>  this test set.
>
>  What you have to do is make sure that the factor levels agree for every
>  factor in both test and training set. One way to do this is to put the
>  test and training set together with rbind(...) say, and then separate
>  them again.  But even this will still have a problem for you.  Because
>  you training set will have some factor levels empty, which are not empty
>  in the test set.  The error will most likely be more subtle, though.
>
>  You really need to sort this out yourself.  It is not particularly an R
>  problem, but a confusion over data.  To be useful, your training set
>  need to cover the field for all levels of every factor.  Think about it.
>
>
>
>  -----Original Message-----
>  From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
>  On Behalf Of Nagu
>  Sent: Saturday, 8 March 2008 5:37 AM
>  To: r-help at r-project.org; r-help at stat.math.ethz.ch
>  Subject: [R] error in random forest
>
>  Hi,
>
>  I get the following error when I try to predict the probabilities of a
>  test sample:
>
>  Error in predict.randomForest(fit.EBA.OM.rf.50, x.OM, type = "prob") :
>   New factor levels not present in the training data
>
>  I have about 630 predictor variables in the dataset x.OM (25 factor
>  variables and the remaining are continuous variables). Any ideas on
>  how to trace it?
>
>  Thank you,
>  Nagu
>
>  ______________________________________________
>  R-help at r-project.org mailing list
>  https://stat.ethz.ch/mailman/listinfo/r-help
>  PLEASE do read the posting guide
>  http://www.R-project.org/posting-guide.html
>  and provide commented, minimal, self-contained, reproducible code.
>
>
>



More information about the R-help mailing list