[R] Dummy variables using rfe in caret for variable selection

Ren rajreni.kaul at gmail.com
Sun May 1 16:52:05 CEST 2011


I'm trying to run "rfe" for variable selection in the caret package, and am
getting an error. My data frame includes a dummy variable with 3 levels.


x <- chlDescr
y <- chl
#crate dummy variable 
levels(x$State) <- c("AL","GA","FL")
dummy <- model.matrix(~State,x)
z <- cbind(dummy, x)
#remove State category variable 
w <- z[,c(-4)]
subsets <- c(2:8)
ctrl<- rfeControl(functions = lmFuncs, method="cv", verbose=FALSE,
returnResamp = "final")
lmProfile <- rfe(w, y, sizes = subsets, rfeControl = ctrl)

Returns: 
Error in `[.data.frame`(x, , retained, drop = FALSE) : 
  undefined columns selected
In addition: Warning message:
In predict.lm(object, x) :
  prediction from a rank-deficient fit may be misleading

When I remove the dummy variables the function runs fine.  
#remove State variable 
Desc <- chlDescr[,-c(1)]
lmProfile <- rfe(Desc, y, sizes = subsets, rfeControl = ctrl)
Returns:
Recursive feature selection

Outer resamping method was 10 iterations of cross-validation. 

Resampling performance over subset size:

 Variables   RMSE Rsquared  RMSESD RsquaredSD Selected
         1 0.2462   0.7454 0.09529    0.17362         
         2 0.2408   0.7680 0.07860    0.12543         
         3 0.2134   0.8285 0.06649    0.09043         
         4 0.2011   0.8609 0.03463    0.05928        *
         5 0.2019   0.8622 0.03421    0.05675         
         6 0.2019   0.8622 0.03421    0.05675         


Can lmFuncs handle dummy variables? How does it need to be modified so it
can?

I'm new at this so any help would be appreciated, thanks.
Reni
http://r.789695.n4.nabble.com/file/n3487861/chl.csv chl.csv  
http://r.789695.n4.nabble.com/file/n3487861/chlDescr.csv chlDescr.csv 

--
View this message in context: http://r.789695.n4.nabble.com/Dummy-variables-using-rfe-in-caret-for-variable-selection-tp3487861p3487861.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list