[R] Selecting A List of Columns

peter dalgaard pdalgd at gmail.com
Fri May 17 12:02:05 CEST 2013


On May 17, 2013, at 08:51 , Sparks, John James wrote:

> Dear R Helpers,
> 
> I need help with a slightly unusual situation in which I am trying to
> select some columns from a data frame.  I know how to use the subset
> statement with column names as in:

Notice that subset() is a convenience function for command line use. The non-standard evaluation tricks in it tend to become inconveniences if you try to use subset() in a function (I can say that, I wrote the blasted thing...). Just use normal subseting functions instead and everything behaves much more predictably. If ImportantVars is a vector of column names, use

mtcars[ImportantVars] 

(or mtcars[,ImportantVars], which also works for matrices).


> 
> 
> x=as.data.frame(matrix(c(1,2,3,
>        1,2,3,
>        1,2,2,
>        1,2,2,
>        1,1,1),ncol=3,byrow=T))
> 
> all.cols<-colnames(x)
> to.keep<-all.cols[1:2]
> 
> Kept<-subset(x,select=to.keep)
> Kept
> 
> However, if I want to select some columns based on a selection of the most
> important variables from a random forest then I find myself stuck.  The
> example below demonstrates the problem.
> 
> 
> library(randomForest)
> 
> data(mtcars)
> mtcars.rf <- randomForest(mpg ~ ., data=mtcars,importance=TRUE)
> Importance<-data.frame(mtcars.rf$importance)
> Importance
> 
> 
> 
> MSEImportance<-head(Importance[order(Importance$X.IncMSE,
> decreasing=TRUE),],3)
> MSEVars<-row.names(MSEImportance)
> MSEVars<-data.frame(MSEVars,stringsAsFactors = FALSE)
> colnames(MSEVars)<-"Vars"
> 
> NodeImportance<-head(Importance[order(Importance$IncNodePurity,decreasing=TRUE),],
> 3)
> NodeVars<-row.names(NodeImportance)
> NodeVars<-data.frame(NodeVars,stringsAsFactors = FALSE)
> colnames(NodeVars)<-"Vars"
> 
> 
> ImportantVars<-rbind(MSEVars,NodeVars)
> ImportantVars<-unique(ImportantVars)
> nrow(ImportantVars)
> ImportantVars<-as.character(ImportantVars)
> ImportantVars
> CarsVarsKept<-subset(mtcars,select=ImportantVars)
> Error in `[.data.frame`(x, r, vars, drop = drop) :
>  undefined columns selected
> 
> Any help on how to select these columns from the data frame would be most
> appreciated.
> 
> --John J. Sparks, Ph.D.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list