[R] R's Data Dredging Philosophy for Distribution Fitting

Thu Jul 15 01:22:14 CEST 2010

Forum, 

I'm a grad student in Civil Eng, took some Stats classes that required
students learn R, and I have since taken to R and use it for as much as I
can.  Back in my lab/office, many of my fellow grad students still use
proprietary software at the behest of advisers who are familiar with the
recommended software (Statistica, @Risk (Excel Add-on), etc).  I have spent
a lot of time learning R and am confident it can generally out-process,
out-graph, or more simply stated, out-perform most of these other software
packages.  However, one area my view has been humbled in is distribution
fitting.

I started by reading through
http://cran.r-project.org/doc/contrib/Ricci-distributions-en.pdf  After that
I started digging around on this forum and found posts like this one
http://r.789695.n4.nabble.com/Fitting-usual-distributions-td800000.html#a800000
that are close to what I'm after.  That is, given an observation dataset, I
would like to call a function that cycles through numerous distributions
(common or not) and then ranks them for me based on Chi-Square,
Kolmogorov-Smirnov and/or Anderson-Darling, for example.  

This question was asked back in 2004:
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/37053.html but the response
was that this kind of thing wasn't in R nor in proprietary software to the
best of the responding author's memory.  In 2010, however, this is no longer
true as @Risk's
(http://www.palisade.com/risk/?gclid=CKvblPSM7KICFZQz5wodDRI2fg)
"Distribution Fitting" function does this very thing.  And it is here that
my R pride has taken a hit.  Based on the first response to the question
posed here
http://r.789695.n4.nabble.com/Which-distribution-best-fits-the-data-td859448.html#a859448
is it fair to say that the R community (I realize this is only 1 view) would
take exception to this kind of "data mining"?  

Unless I've missed a discussion of a package that does this very thing, it
seems as though I would need to code something up using fitdistr() and do
all the ranking myself.  Undoubtedly that would be a good exercise for me,
but its hard for me to believe R would be a runner-up to something like
distribution fitting in @Risk.

Eric
-- 
View this message in context: http://r.789695.n4.nabble.com/R-s-Data-Dredging-Philosophy-for-Distribution-Fitting-tp2289508p2289508.html
Sent from the R help mailing list archive at Nabble.com.