[R] Histogram Ranking

John Day jday at csi-inc.com
Fri Sep 6 20:30:40 CEST 2002


Hello,

This is not exactly an R question, but I suspect that there is an R 
procedure that does what I am calling (for lack of a better name) 
"histogram ranking".

I'm trying to evaluate a set of regression features by segregating by 
target class and comparing the feature histograms. My idea is that if the 
histograms are the same for two different classes then there is no 
predictive power in those features. Conversely, if the histograms are 
different then there is probably some predictive "juice" that we can 
squeeze out of the features with regression.

The histograms are computing by partitioning  the features into equally 
spaced bins over their spans and counting the sample values in each bin 
that corresponds to that partition of feature space. This is done for each 
target class, so the resulting histograms are the features distributions 
conditioned by target class.

Since the histograms are numeric vectors, we can measure the "goodness" of 
a feature set by evaluating the "distance" between histograms. The bigger 
the better etc.

Now I'm no statistics expert. Have I re-invented some "wheel" here? What is 
the canonical name for this kind of analysis? Is this kind of analysis 
routinely done in R? [Is there a "better" way to do all this?]

Thanks,
John Day

I

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list