[R] Method for checking automatically which distribtions fits a data

Frank E Harrell Jr f.harrell at vanderbilt.edu
Mon Jul 7 19:00:54 CEST 2008


David Reinke wrote:
> The function ks.test(x,y, ...) performs a Kolmogorov-Smirnov test on a set
> of sample values x against a distribution y. Both x and y must be
> cumulative distributions; y can be either a vector of cumulative values or
> a predefined distribution such as pnorm().
> 
> David Reinke

If you find which distribution best fits the empirical distribution, the 
resulting estimates will have variances (once model uncertainty is taken 
into account through bootstrapping) that are equal to those from the 
empirical CDF so nothing is gained.   You can use the empirical CDF as 
the "final answer" unless prior knowledge on the distributional shape is 
available.

Frank Harrell

> 
> Senior Transportation Engineer/Economist
> Dowling Associates, Inc.
> 180 Grand Avenue, Suite 250
> Oakland, California 94612-3774
> 510.839.1742 x104 (voice)
> 510.839.0871 (fax)
> www.dowlinginc.com
> 
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
> Behalf Of hadley wickham
> Sent: Monday, July 07, 2008 8:10 AM
> To: Ben Bolker
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] Method for checking automatically which distribtions fits
> a data
> 
>>> Suppose I have a vector of data.
>>> Is there a method in R to help us automatically
>>> suggest which distributions fits to that data
>>> (e.g. normal, gamma, multinomial etc) ?
>>>
>>> - Gundala Viswanath
>>> Jakarta - Indonesia
>>>
>> See
>>
>> https://stat.ethz.ch/pipermail/r-help/2008-June/166259.html
>>
>>  for example, normal vs gamma might be a sensible question
>> (for which you can use fitdistr() as suggested above), but
>> "multinomial" implies a very specific kind of response --
>> discrete data with a specified number of possible outcomes.
> 
> Yes - the question as it is poorly stated.   If you have a small
> (finite) choice of possible distributions you can use some kind of
> likelihood based statistic to determine which fits the data best.  But
> what is the population of distributions in this case?   All
> distributions that you see in stats101?  All distributions that have
> names?   All continuous distributions?
> 
> Hadley
> 
> 


-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University



More information about the R-help mailing list