[R] histogram first bar wrong position
pdalgd at gmail.com
Fri Dec 23 19:32:08 CET 2016
> On 22 Dec 2016, at 18:08 , William Dunlap via R-help <r-help at r-project.org> wrote:
> As a practical matter, 'continuous' data must be discretized, so if you
> have long vectors of it you will run into this problem.
Yep, and it is a bit unfortunate that hist() tries to use "pretty" breakpoints, so that you will have data points on the boundaries, causing all the left/right/endpoint business to come into play. The truehist() function in MASS does somewhat better.
For the case at hand, things are much improved by setting the breaks explicitly:
hist(y,freq=TRUE, col='red', breaks=0.5:6.5)
but as pointed out by others, it is a much better idea to do
Incidentally, what is the most handy way to get a plot with percentages instead of counts? This works, but seems a bit ham-fisted:
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
> On Thu, Dec 22, 2016 at 8:19 AM, Martin Maechler <maechler at stat.math.ethz.ch
>>>>>>> itpro <itpro1 at yandex.ru>
>>>>>>> on Thu, 22 Dec 2016 16:17:28 +0300 writes:
>>> Hi, everyone.
>>> I stumbled upon weird histogram behaviour.
>>> Consider this "dice emulator":
>>> Step 1: Generate uniform random array x of size N.
>>> Step 2: Multiply each item by six and round to next bigger integer
>> to get numbers 1 to 6.
>>> Step 3: Plot histogram.
>>>> hist(y,freq=TRUE, col='orange')
>>> Now what I get with N=100000
>>>> hist(y,freq=TRUE, col='green')
>>> At first glance looks OK.
>>> Now try N=100
>>>> hist(y,freq=TRUE, col='red')
>>> Now first bar is not where it should be.
>>> Hmm. Look again to 100000 histogram... First bar is not where I want
>> it, it's only less striking due to narrow bars.
>>> So, first bar is always in wrong position. How do I fix it to make
>> perfectly spaced bars?
>> Don't use histograms *at all* for such discrete integer data.
>> N <- rpois(100, 5)
>> plot(table(N), lwd = 4)
>> Histograms should be only be used for continuous data (or discrete data
>> with "many" possible values).
>> It's a pain to see them so often "misused" for data like the 'N' above.
>> Martin Maechler,
>> ETH Zurich
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> PLEASE do read the posting guide http://www.R-project.org/
>> and provide commented, minimal, self-contained, reproducible code.
> [[alternative HTML version deleted]]
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Office: A 4.23
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
More information about the R-help