[R] How hist() decides breaks?

(Ted Harding) Ted.Harding at manchester.ac.uk
Mon May 19 13:00:32 CEST 2008


On 19-May-08 09:38:48, jim holtman wrote:
> Why don't you specifically tell hist what breaks to use:
> 
> hist(x, breaks=seq(min(x), max(x), length=50), include.lowest=TRUE)

Yes, I'm aware that it can be forced in that kind of way.
Indeed, that is what "?hist" says, in effect.

What I was interested in is what hist() actually does in response
to something like "breaks=50", and why. This was something I was
not able to find out from the documentaion.

Thanks,
Ted.

> On Mon, May 19, 2008 at 5:31 AM, Ted Harding
> <Ted.Harding at manchester.ac.uk>
> wrote:
> 
>> Hi Folks,
>> I'd like to know how hist() decides how many cells to use
>> when it ignores my "suggestion" to use say 'hist(...,breaks=50)'.
>>
>> More specifically, I have the results of 10000 simulations,
>> each returning an 8-vector, therefore 8 variables each with
>> 10000 values. Some of these 8 have somewhat skew distributions.
>> Say one of these 8 variables is X.
>>
>> I ask for H <- hist(X,breaks=50), and get a histogram which
>> usually has a different number of cells than what I intended.
>>
>> For instance, for one of these simulations, the 8 different
>> values of length(H$breaks) are:
>>
>>  70, 44, 38, 68, 50, 40, 46, 45
>>
>> ?hist tells me
>>
>> A)
>>  breaks: one of:
>>    *  a vector giving the breakpoints between histogram
>>       cells,
>>    *  a single number giving the number of cells for the
>>       histogram,
>>    *  a character string naming an algorithm to compute the
>>       number of cells (see Details),
>>    *  a function to compute the number of cells.
>>
>>    In the last three cases the number is a suggestion only.
>>
>> B)
>>  The default for 'breaks' is '"Sturges"': see 'nclass.Sturges'.
>>
>> If I look at the code for nclass.Sturges() I see
>>
>>  function (x) ceiling(log2(length(x)) + 1)
>>
>> and, for length(X) = 10000, this gives 15. This is not related
>> to any of the numbers of breaks I actually got, in any way obvious
>> to me.
>>
>> So:
>> Question 1: hist() has apparently ignored my "suggestion" of
>>  "break=50". Why? What is the criterion for ignoring?
>>
>> Question 2: Presumably, if it ignores the "suggestion", it
>>  does something else, of its choice. I would then, perhaps,
>>  expect it to fall back to its default, which is (allegedly)
>>  Sturges. But the result from nclass.Sturges looks different
>>  from what it actually did. So what did it actually do, and
>>  how did it decide on this?
>>
>> With thanks,
>> Ted.
>>
>> --------------------------------------------------------------------
>> E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
>> Fax-to-email: +44 (0)870 094 0861
>> Date: 19-May-08                                       Time: 10:31:20
>> ------------------------------ XFMail ------------------------------
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html<http://www.r-project.org/po
>> sting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 
> 
> -- 
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
> 
> What is the problem you are trying to solve?

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 19-May-08                                       Time: 11:43:35
------------------------------ XFMail ------------------------------



More information about the R-help mailing list