[R] Fwd: varimp_in_party_package

Torsten Hothorn Torsten.Hothorn at stat.uni-muenchen.de
Tue Jun 21 17:57:07 CEST 2011


On Thu, 16 Jun 2011, Jinrui Xu wrote:

> Thanks for your feedback.
> I think the problem is not because of many levels. There is only 1 column 
> with two levels as class labels in my input data.
>
> Below is my code. The commandline "data.cforest.varimp <- 
> varimp(data.cforest, conditional = TRUE)" reports "Error in 
> model.matrix.default(as.formula(f),data = blocks): term 1 would require 4e+17 
> columns"
>
> I also attached my input file. Hope you can run it for me to check what the 
> problem is. Thanks a lot!
>
> PS: It takes 10 mins to finish the code below by 1 cpu and upto 2.5 GB 
> memory. You can reduce the columns in the rawinput, which reduces computing 
> intense and feeds back same error.
>
> library(randomForest)
> library(party)
>
> set.seed(71)
>
> rawinput <- read.table("featureSelection_rec.vectors")
> rawinput$V1 <- as.factor(as.numeric(rawinput$V1))
>
> data.controls <- cforest_unbiased(ntree=500, mtry=3)
> data.cforest <- cforest(V1~.,data=rawinput,controls=data.controls)
> data.cforest.varimp <- varimp(data.cforest, conditional = TRUE)
>

Hi Jinrui,

it turns out that for your data-set there are (using the default) 
parameters 47 variables to condition on and thats way to much. You can 
reduce the number of conditioning variables by increasing the `threshold'
parameter to something like .8

Best,

Torsten

>
>
>
>> there is a factor with (too) many levels in your data frame `rawinput'.
>> 
>> summary(rawinput)
>> 
>> will tell you which one.
>> 
>> Torsten
>
>
>
> Quoting Torsten Hothorn <Torsten.Hothorn at stat.uni-muenchen.de>:
>
>>> 
>>> Hello everyone,
>>> 
>>> I use the following command lines to get important variable from training 
>>> dataset.
>>> 
>>> 
>>> data.controls <- cforest_unbiased(ntree=500, mtry=3)
>>> data.cforest <- cforest(V1~.,data=rawinput,controls=data.controls)
>>> data.cforest.varimp <- varimp(data.cforest, conditional = TRUE)
>>> 
>>> I got error: "Error in model.matrix.default(as.formula(f),data = blocks): 
>>> term 1 would require 4e+17 columns"
>>> 
>>> 
>>> I changed data dimension to 150. The problem still exists. So, I guess 
>>> there are other problems. Please give me some help or hints. Thanks!
>>> 
>>> jinrui,
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide 
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>> 
>> 
>> 
>
>



More information about the R-help mailing list