[R] histogram first bar wrong position

peter dalgaard pdalgd at gmail.com
Fri Dec 23 19:32:08 CET 2016


> On 22 Dec 2016, at 18:08 , William Dunlap via R-help <r-help at r-project.org> wrote:
> 
> As a practical matter, 'continuous' data must be discretized, so if you
> have long vectors of it you will run into this problem.

Yep, and it is a bit unfortunate that hist() tries to use "pretty" breakpoints, so that you will have data points on the boundaries, causing all the left/right/endpoint business to come into play. The truehist() function in MASS does somewhat better. 

For the case at hand, things are much improved by setting the breaks explicitly:

hist(y,freq=TRUE, col='red', breaks=0.5:6.5)

but as pointed out by others, it is a much better idea to do

plot(factor(y, levels=1:6))

or similar. 

Incidentally, what is the most handy way to get a plot with percentages instead of counts? This works, but seems a bit ham-fisted:

barplot(prop.table(table(factor(y, levels=1:6))))

-pd

> 
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
> 
> On Thu, Dec 22, 2016 at 8:19 AM, Martin Maechler <maechler at stat.math.ethz.ch
>> wrote:
> 
>>>>>>> itpro  <itpro1 at yandex.ru>
>>>>>>>    on Thu, 22 Dec 2016 16:17:28 +0300 writes:
>> 
>>> Hi, everyone.
>>> I stumbled upon weird histogram behaviour.
>> 
>>> Consider this "dice emulator":
>>> Step 1: Generate uniform random array x of size N.
>>> Step 2: Multiply each item by six and round to next bigger integer
>> to get numbers 1 to 6.
>>> Step 3: Plot histogram.
>> 
>>>> x<-runif(N)
>>>> y<-ceiling(x*6)
>>>> hist(y,freq=TRUE, col='orange')
>> 
>> 
>>> Now what I get with N=100000
>> 
>>>> x<-runif(100000)
>>>> y<-ceiling(x*6)
>>>> hist(y,freq=TRUE, col='green')
>> 
>>> At first glance looks OK.
>> 
>>> Now try N=100
>> 
>>>> x<-runif(100)
>>>> y<-ceiling(x*6)
>>>> hist(y,freq=TRUE, col='red')
>> 
>>> Now first bar is not where it should be.
>>> Hmm. Look again to 100000 histogram... First bar is not where I want
>> it, it's only less striking due to narrow bars.
>> 
>>> So, first bar is always in wrong position. How do I fix it to make
>> perfectly spaced bars?
>> 
>> Don't use histograms *at all* for such discrete integer data.
>> 
>> N <- rpois(100, 5)
>> plot(table(N), lwd = 4)
>> 
>> Histograms should be only be used for continuous data (or discrete data
>> with "many" possible values).
>> 
>> It's a pain to see them so often "misused" for data like the 'N' above.
>> 
>> Martin Maechler,
>> ETH Zurich
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list