[R] problem with certain data sets when using randomForest

Liaw, Andy andy_liaw at merck.com
Wed Aug 31 15:47:29 CEST 2005


I've been trying to play catch-up on R-help since DSC2005.  This one must
have slipped through...

This is what I'd do:

iris.sub <- subset(iris, Species %in% c("setosa", "virginica"))
iris.sub$Species <- factor(iris.sub$Species)

That last line drops the empty level in the factor.  You can then run
randomForest with that data.

HTH,
Andy

> From: Martin Lam
> 
> Hi,
> 
> Since I've had no replies on my previous post about my
> problem I am posting it again in the hope someone
> notice it. The problem is that the randomForest
> function doesn't take datasets which has instances
> only containing a subset of  all the classes. So the
> dataset with instances that either belong to class "a"
> or "b" from the levels "a", "b" and "c" doesn't work
> because there is no instance that has class "c". Is
> there any way to solve this problem?
> 
> library("randomForest")
> 
> # load the iris plant data set
> dataset <- iris
> 
> numberarray <- array(1:nrow(dataset), nrow(dataset),
> 1)
> 
> # include only instances with Species = setosa or
> virginica
> indices <- t(numberarray[(dataset$Species == "setosa"
> | 
> dataset$Species == "virginica") == TRUE])
> 
> finaldataset <- dataset[indices,]
> 
> # just to let you see the 3 classes
> levels(finaldataset$Species)
> 
> # create the random forest
> randomForest(formula = Species ~ ., data =
> finaldataset, ntree = 5)
> 
> # The error message I get
> Error in randomForest.default(m, y, ...) : 
>         Can't have empty classes in y.
> 
> #The problem is that the finaldataset doesn't contain
> #any instances of "versicolor", so I think the only
> way #to solve this problem is by changing the levels
> the #"Species" have to only "setosa" and "virginica",
> # correct me if I'm wrong.
> 
> # So I tried to change the levels but I got stuck:
> 
> # get the possible unique classes
> uniqueItems <- unique(levels(finaldataset$Species))
> 
> # the problem!
> newlevels <- list(uniqueItems[1] = c(uniqueItems[1],
> uniqueItems[2]), uniqueItems[3] = uniqueItems[3])
> 
> # Error message
> Error: syntax error
> 
> # In the help they use constant names to rename the
> #levels, so this works (but that's not what I want
> #because I don't want to change the code every time I
> #use another data set):
> newlevels <- list("setosa" = c(uniqueItems[1],
> uniqueItems[2]), "virginica" = uniqueItems[3])
> 
> levels(finaldataset$Species) <- newlevels
> 
> levels(finaldataset$Species)
> 
> finaldataset$Species
> 
> ---------------------------
> 
> Thanks in advance,
> 
> Martin
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 
>




More information about the R-help mailing list