[R] Overlaying graphs

Paul Meagher paul at datavore.com
Fri Sep 5 16:31:07 CEST 2003


From: "Damon Wischik" <djw1005 at cam.ac.uk>

> Paul Meagher wrote:
> > 2. Does R have a suite of "best-fit" tools for finding the best
> > fitting-probability distribution for any observed probability
distribution?
>
> I think that the best-fitting probability distribution for an observed
> probability distribution is the empirical distribution of your
> observations.
>
> (Perhaps you have some other criteria than just goodness of fit?)

You can certainly use the empirical distribution of observations to
construct your probability distribution and you are correct that, in some
sense, this would be the best fitting probability distribution.

Lately I have been asking myself why we bother in the first place to use
theoretical probability distributions to model our empirically
distributions.  Why not construct the probability distribution directly from
the data itself?  I think that in some cases, this is the correct route to
go.  Computers allow us to make inferences about the probability of certain
outcomes using these irregularly shaped distributions.  These inferences may
be more accurate than using any of the available theoretical probability
distributions.

The main reasons I can come up with for not using the empirical distribution
itself as your probability distribution are:

1. Over-fitting which limits your ability to generalize to new situations.
This, I think, is most important reason for engaging in the exercise of
fitting your data to a theoretical distribution.

2. It is easier to derive inferences about your random variable.  This is
the second most important reason.

3. Anyone who plays with numbers constitutionally tends towards platonism.

Regards,
Paul

> Damon.
>
>




More information about the R-help mailing list