[R] problem with certain data sets when using randomForest

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Aug 26 18:19:39 CEST 2005


Look at ?"[.factor":

 	finaldataset$Species <- finaldataset$Species[,drop=TRUE]

solves this.

On Fri, 26 Aug 2005, Martin Lam wrote:

> Hi,
>
> Since I've had no replies on my previous post about my
> problem I am posting it again in the hope someone
> notice it. The problem is that the randomForest
> function doesn't take datasets which has instances
> only containing a subset of  all the classes. So the
> dataset with instances that either belong to class "a"
> or "b" from the levels "a", "b" and "c" doesn't work
> because there is no instance that has class "c". Is
> there any way to solve this problem?
>
> library("randomForest")
>
> # load the iris plant data set
> dataset <- iris
>
> numberarray <- array(1:nrow(dataset), nrow(dataset),
> 1)
>
> # include only instances with Species = setosa or
> virginica
> indices <- t(numberarray[(dataset$Species == "setosa"
> |
> dataset$Species == "virginica") == TRUE])
>
> finaldataset <- dataset[indices,]
>
> # just to let you see the 3 classes
> levels(finaldataset$Species)
>
> # create the random forest
> randomForest(formula = Species ~ ., data =
> finaldataset, ntree = 5)
>
> # The error message I get
> Error in randomForest.default(m, y, ...) :
>        Can't have empty classes in y.
>
> #The problem is that the finaldataset doesn't contain
> #any instances of "versicolor", so I think the only
> way #to solve this problem is by changing the levels
> the #"Species" have to only "setosa" and "virginica",
> # correct me if I'm wrong.
>
> # So I tried to change the levels but I got stuck:
>
> # get the possible unique classes
> uniqueItems <- unique(levels(finaldataset$Species))
>
> # the problem!
> newlevels <- list(uniqueItems[1] = c(uniqueItems[1],
> uniqueItems[2]), uniqueItems[3] = uniqueItems[3])
>
> # Error message
> Error: syntax error
>
> # In the help they use constant names to rename the
> #levels, so this works (but that's not what I want
> #because I don't want to change the code every time I
> #use another data set):
> newlevels <- list("setosa" = c(uniqueItems[1],
> uniqueItems[2]), "virginica" = uniqueItems[3])
>
> levels(finaldataset$Species) <- newlevels
>
> levels(finaldataset$Species)
>
> finaldataset$Species
>
> ---------------------------
>
> Thanks in advance,
>
> Martin
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list