[R] Pre-model Variable Reduction

Ravi Varadhan RVaradhan at jhmi.edu
Tue Dec 9 15:35:21 CET 2008


Principal components analysis does "dimensionality reduction" but NOT
"variable reduction".  However, Jolliffe's 2004 book on PCA does discuss the
problem of selecting a subset of variables, with the goal of representing
the internal variation of original multivariate vector as well as possible
(see Section 6.3 of that book).  I do not think that these methods can
handle missing data.  The most important issue is to think about the goal of
variable reduction and then choose an appropriate optimality criterion for
achieving that goal.  In most instances of variable selection, the criterion
that is optimized is never explicitly considered.

Ravi.

----------------------------------------------------------------------------
-------

Ravi Varadhan, Ph.D.

Assistant Professor, The Center on Aging and Health

Division of Geriatric Medicine and Gerontology 

Johns Hopkins University

Ph: (410) 502-2619

Fax: (410) 614-9625

Email: rvaradhan at jhmi.edu

Webpage:  http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html

 

----------------------------------------------------------------------------
--------


-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Gabor Grothendieck
Sent: Tuesday, December 09, 2008 8:00 AM
To: Harsh
Cc: r-help at r-project.org
Subject: Re: [R] Pre-model Variable Reduction

See:

?prcomp
?princomp

On Tue, Dec 9, 2008 at 5:34 AM, Harsh <singhalblr at gmail.com> wrote:
> Hello All,
> I am trying to carry out variable reduction. I do not have information 
> about the dependent variable, and have only the X variables as it 
> were.
> In selecting variables I wish to keep, I have considered the following
criteria.
> 1) Percentage of missing value in each column/variable
> 2) Variance of each variable, with a cut-off value.
>
> I recently came across Weka and found that there is an RWeka package 
> which would allow me to make use of Weka through R.
> Weka provides a "Genetic search" variable reduction method, but I 
> could not find its R code implementation in the RWeka Pdf file on 
> CRAN.
>
> I looked for other R packages that allow me to do variable reduction 
> without considering a dependent variable. I came across 'dprep'
> package but it does not have a Windows implementation.
>
> Moreover, I have a dataset that contains continuous and categorical 
> variables, some categorical variables having 3 levels, 10 levels and 
> so on, till a max 50 levels (E.g. States in the USA).
>
> Any suggestions in this regard will be much appreciated.
>
> Thank you
>
> Harsh Singhal
> Decision Systems,
> Mu Sigma, Inc.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list