[R] Salient feature selection
gunter.berton at gene.com
Mon Jul 2 17:38:02 CEST 2007
See e.g. the pls package. However, be forewarned: this is a vague problem
(what kind of predictors/responses do you want? -- linear combinations?
nonlinear combinations? ...). The problem is also NP-Hard I believe, so
solutions are very algorithm (and even starting value)-dependent. For these
reasons, statistical inference is difficult, at best, and probably not even
meaningful in your context, as I doubt that you have a random sample of
anything. A personal recommendation (with which many disagree, I know): seek
extreme parsimony in both predictors and responses for results to be
Genentech Nonclinical Statistics
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Andy Weller
Sent: Monday, July 02, 2007 8:17 AM
To: R-help at stat.math.ethz.ch
Subject: [R] Salient feature selection
I am relatively new to R. I am hoping that someone will be able to point
me in the right direction and/or suggest a technique/package/reference
that will help me with the following. I have:
a) Some explanatory variables (integers, real) - these are "real world"
physical descriptions, i.e. counts of features, etc
b) Some response variables (integers, real) - these are image analysis
measurements (gray-value distributions, textural descriptors, etc) of
the same things represented in a
and I want to find out which between the two correlate best - i.e. the
salient features from BOTH sets (i.e. not for classification purposes).
For example, if a has 10 explanatory variables and b has 10 response
variables, I want to test the complete set of explanatory variables with
each individual response (or vice versa). So, explanatory 1-10 with
response 1, explanatory 1-10 with response 2, explanatory 1-10 with
response 3, etc...
This should ultimately tell me which "real world" physical features are
related best with the image analysis measurements (with the confidence
level between them).
I hope this makes sense?
I have used SPSS AnswerTree's "Exhaustive CHAID" before to select a
subset of input features for a complete set of output features to aid
the creation of artificial neural networks. I want to do a similar
thing, but it is not important for ALL explanatory and response
variables are used/selected.
I hope that I have been clear in my intentions and I look forward to
your replies, Andy
R-help at stat.math.ethz.ch mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help