[R] randomForest.error: length of response must be the same as predictors

Gavin Simpson gavin.simpson at ucl.ac.uk
Thu Jul 3 10:50:22 CEST 2008


On Thu, 2008-07-03 at 12:11 +0530, Soumyadeep Nandi wrote:
> My data looks like:
> A,B,C,D,Class
> 1,2,0,2,cl1
> 1,5,1,9,cl1
> 3,2,1,2,cl2
> 7,2,1,2,cl2
> 2,2,1,2,cl2
> 1,2,1,5,cl2
> 0,2,1,2,cl2
> 4,2,1,2,cl2
> 3,5,1,2,cl2
> 3,2,12,3,cl2
> 3,2,4,2,cl2
> 
> **The steps followed are:
> trainfile <- read.csv("TrainFile",head=TRUE)
> datatrain <- subset(trainfile,select=c(-Class))
> classtrain <- (subset(trainfile,select=Class))
> rf <- randomForest(datatrain, classtrain)
> 
> Error in randomForest.default(classtrain, datatrain) :
>   length of response must be the same as predictors
> In addition: Warning message:
> In randomForest.default(classtrain, datatrain) :
>   The response has five or fewer unique values.  Are you sure you want to do
> regression?
> 
> Could someone suggest me where I am going wrong.

Yep, look at class(classtrain):

> class(classtrain)
[1] "data.frame"

subset() returns a data.frame, which is a special case of a list. The
lengths of a list (and therefore a data frame) are not what you expect:

> length(classtrain)
[1] 1

There is *1* component to the list, one '$' bit that you can get at.
Hence, rf complains as, to it, the length of x and y are not the same,
when evaluated using length().

Note that ?randomForest does state that y should be a response 'vector',
so you are not supplying what is required.

Two ways to proceed:

rf <- randomForest(Class ~ ., data = trainfile)

or if you really don't want the formula parsing, force the empty
dimension to be dropped, by subsetting:

rf <- randomForest(datatrain, classtrain[,1])

[Nb, as classtrain is of class "data.frame", drop() will not work on it
as it doesn't have a dim attribute]

HTH

G

> 
> Thanks
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list