[R] Fitting Distributions Directly From a Histogram

Mon Jun 19 16:41:15 CEST 2006

I'm desperately uninformed about the methods you have been discussing, but for a non-parametric solution, how about splines?

Carl DeBoor (1978) outlines a method of area preserving quadratic splines to fit histograms. All you need to know is the value each bar represents and its interval. I am currently implementing this in S-Plus. This method is included in the spline toolbox for Matlab, which DeBoor authored.

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Prof Brian Ripley
Sent: Monday, June 19, 2006 2:41 AM
To: Spencer Graves
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] Fitting Distributions Directly From a Histogram

On Sun, 18 Jun 2006, Spencer Graves wrote:

> 	  Won't 'interval censoring using 'survreg' in the 'survival' package
> handle this?
> (http://finzi.psych.upenn.edu/R/library/survival/html/survreg.html)

Yes, but only for distributions it supports (and I think even with a 
user-supplied distribution only for positive values).

Coding up a way to fit mles to grouped data would be a fairly simple 
exercise based on fitdistr, but finding suitable starting values for 
optimization would be more difficult.

It is quite rare these days for only the grouped data to be available, now 
most data collection is automated.  I did something like this for 
analytical chemists in the mid 1980s (pre-R and indeed pre-S for me), but 
their need had vanished by the time I started to port things to S.

>
> 	  Hope this helps.
> 	  Spencer Graves
>
> Vincent Goulet wrote:
>> Le Lundi 12 Juin 2006 06:51, Lorenzo Isella a écrit :
>>> Dear All,
>>>
>>> A simple question: packages like fitdistr should be ideal to analyze
>>> samples of data taken from a univariate distribution, but what if
>>> rather than the raw data of the observations you are given directly
>>> and only a histogram?
>>
>> Let's assume that you have not only the histogram itself, but also the breaks
>> and the counts per bin. Then you have what grouped data --- at least that's
>> how we call those in Actuarial Science. Maximum likelyhood estimation is
>> feasible for such data, but it is slightly more complicated. "Loss Models" by
>> Klugman, Panjer & Willmot (Wiley) covers this.
>>
>> I'm now thinking of adding this to my actuarial science package "actuar"...
>>
>>> I was thinking about generating artificially a set of data
>>> corresponding to the counts binned in the histogram, but this sounds
>>> too cumbersome.
>>> Another question is the following: fitdistr provides the value of the
>>> log-likely hood function, but what if I want e.g. a chi square test to
>>> get some insight on the goodness of the fitting?
>>
>> Goodness of fit tests for grouped data are also covered in Loss Models.
>>
>>> I am sure there must be a way to get it straightforwardly without
>>> coding it myself.
>>
>> Once you have the theory, I'm afraid for now you will have to code the
>> estimation procedure yourself.
>>
>> Cheers.
>>
>>> Many thanks
>>>
>>> Lorenzo
>>>
>>> ______________________________________________
>>> R-help at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide!
>>> http://www.R-project.org/posting-guide.html
>>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595