[R] Truncate levels to use randomForest

Martin Lam tmlammail at yahoo.com
Fri Aug 26 10:15:35 CEST 2005


Hi,

I will explain my problem with this example:

library("randomForest")

# load the iris plant data set
dataset <- iris

numberarray <- array(1:nrow(dataset), nrow(dataset),
1)

# include only instances with Species = setosa or
virginica
indices <- t(numberarray[(dataset$Species == "setosa"
| 
dataset$Species == "virginica") == TRUE])

finaldataset <- dataset[indices,]

# just to let you see the 3 classes
levels(finaldataset$Species)

# create the random forest
randomForest(formula = Species ~ ., data =
finaldataset, ntree = 5)

# The error message I get
Error in randomForest.default(m, y, ...) : 
        Can't have empty classes in y.

#The problem is that the finaldataset doesn't contain
#any instances of "versicolor", so I think the only
way #to solve this problem is by changing the levels
the #"Species" have to only "setosa" and "virginica",
# correct me if I'm wrong.

# So I tried to change the levels but I got stuck:

# get the possible unique classes
uniqueItems <- unique(levels(finaldataset$Species))

# the problem!
newlevels <- list(uniqueItems[1] = c(uniqueItems[1],
uniqueItems[2]), uniqueItems[3] = uniqueItems[3])

# Error message
Error: syntax error

# In the help they use constant names to rename the
#levels, so this works (but that's not what I want
#because I don't want to change the code every time I
#use another data set):
newlevels <- list("setosa" = c(uniqueItems[1],
uniqueItems[2]), "virginica" = uniqueItems[3])

levels(finaldataset$Species) <- newlevels

levels(finaldataset$Species)

finaldataset$Species

---------------------------

Thanks in advance,

Martin




More information about the R-help mailing list