[R] Pre-model Variable Reduction

Harsh singhalblr at gmail.com
Tue Dec 9 11:34:01 CET 2008


Hello All,
I am trying to carry out variable reduction. I do not have information
about the dependent variable, and have only the X variables as it
were.
In selecting variables I wish to keep, I have considered the following criteria.
1) Percentage of missing value in each column/variable
2) Variance of each variable, with a cut-off value.

I recently came across Weka and found that there is an RWeka package
which would allow me to make use of Weka through R.
Weka provides a "Genetic search" variable reduction method, but I
could not find its R code implementation in the RWeka Pdf file on
CRAN.

I looked for other R packages that allow me to do variable reduction
without considering a dependent variable. I came across 'dprep'
package but it does not have a Windows implementation.

Moreover, I have a dataset that contains continuous and categorical
variables, some categorical variables having 3 levels, 10 levels and
so on, till a max 50 levels (E.g. States in the USA).

Any suggestions in this regard will be much appreciated.

Thank you

Harsh Singhal
Decision Systems,
Mu Sigma, Inc.



More information about the R-help mailing list