[R] Pre-model Variable Reduction

Frank E Harrell Jr f.harrell at vanderbilt.edu
Tue Dec 9 14:08:31 CET 2008


Harsh wrote:
> Hello All,
> I am trying to carry out variable reduction. I do not have information
> about the dependent variable, and have only the X variables as it
> were.
> In selecting variables I wish to keep, I have considered the following criteria.
> 1) Percentage of missing value in each column/variable
> 2) Variance of each variable, with a cut-off value.
> 
> I recently came across Weka and found that there is an RWeka package
> which would allow me to make use of Weka through R.
> Weka provides a "Genetic search" variable reduction method, but I
> could not find its R code implementation in the RWeka Pdf file on
> CRAN.
> 
> I looked for other R packages that allow me to do variable reduction
> without considering a dependent variable. I came across 'dprep'
> package but it does not have a Windows implementation.
> 
> Moreover, I have a dataset that contains continuous and categorical
> variables, some categorical variables having 3 levels, 10 levels and
> so on, till a max 50 levels (E.g. States in the USA).
> 
> Any suggestions in this regard will be much appreciated.
> 
> Thank you
> 
> Harsh Singhal
> Decision Systems,
> Mu Sigma, Inc.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

Take a look at the the redun function in the Hmisc package, which does 
redundancy analysis.

Frank

-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University



More information about the R-help mailing list