[R] Salient feature selection

Bert Gunter gunter.berton at gene.com
Mon Jul 2 17:38:02 CEST 2007


See e.g. the pls package. However, be forewarned: this is a vague problem
(what kind of predictors/responses do you want? -- linear combinations?
nonlinear combinations? ...). The problem is also NP-Hard I believe, so
solutions are very algorithm (and even starting value)-dependent. For these
reasons, statistical inference is difficult, at best, and probably not even
meaningful in your context, as I doubt that you have a random sample of
anything. A personal recommendation (with which many disagree, I know): seek
extreme parsimony in both predictors and responses for results to be
replicable/scientifically meaningful.

Bert Gunter
Genentech Nonclinical Statistics

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Andy Weller
Sent: Monday, July 02, 2007 8:17 AM
To: R-help at stat.math.ethz.ch
Subject: [R] Salient feature selection

I am relatively new to R. I am hoping that someone will be able to point 
me in the right direction and/or suggest a technique/package/reference 
that will help me with the following. I have:

a) Some explanatory variables (integers, real) - these are "real world" 
physical descriptions, i.e. counts of features, etc

b) Some response variables (integers, real) - these are image analysis 
measurements (gray-value distributions, textural descriptors, etc) of 
the same things represented in a

and I want to find out which between the two correlate best - i.e. the 
salient features from BOTH sets (i.e. not for classification purposes).

For example, if a has 10 explanatory variables and b has 10 response 
variables, I want to test the complete set of explanatory variables with 
each individual response (or vice versa). So, explanatory 1-10 with 
response 1, explanatory 1-10 with response 2, explanatory 1-10 with 
response 3, etc...

This should ultimately tell me which "real world" physical features are 
related best with the image analysis measurements (with the confidence 
level between them).

I hope this makes sense?

I have used SPSS AnswerTree's "Exhaustive CHAID" before to select a 
subset of input features for a complete set of output features to aid 
the creation of artificial neural networks. I want to do a similar 
thing, but it is not important for ALL explanatory and response 
variables are used/selected.

I hope that I have been clear in my intentions and I look forward to 
your replies, Andy

R-help at stat.math.ethz.ch mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list