[R] histogram first bar wrong position

William Dunlap wdunlap at tibco.com
Thu Dec 22 18:08:35 CET 2016


As a practical matter, 'continuous' data must be discretized, so if you
have long vectors of it you will run into this problem.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, Dec 22, 2016 at 8:19 AM, Martin Maechler <maechler at stat.math.ethz.ch
> wrote:

> >>>>> itpro  <itpro1 at yandex.ru>
> >>>>>     on Thu, 22 Dec 2016 16:17:28 +0300 writes:
>
>     > Hi, everyone.
>     > I stumbled upon weird histogram behaviour.
>
>     > Consider this "dice emulator":
>     > Step 1: Generate uniform random array x of size N.
>     > Step 2: Multiply each item by six and round to next bigger integer
> to get numbers 1 to 6.
>     > Step 3: Plot histogram.
>
>     >> x<-runif(N)
>     >> y<-ceiling(x*6)
>     >> hist(y,freq=TRUE, col='orange')
>
>
>     > Now what I get with N=100000
>
>     >> x<-runif(100000)
>     >> y<-ceiling(x*6)
>     >> hist(y,freq=TRUE, col='green')
>
>     > At first glance looks OK.
>
>     > Now try N=100
>
>     >> x<-runif(100)
>     >> y<-ceiling(x*6)
>     >> hist(y,freq=TRUE, col='red')
>
>     > Now first bar is not where it should be.
>     > Hmm. Look again to 100000 histogram... First bar is not where I want
> it, it's only less striking due to narrow bars.
>
>     > So, first bar is always in wrong position. How do I fix it to make
> perfectly spaced bars?
>
> Don't use histograms *at all* for such discrete integer data.
>
>  N <- rpois(100, 5)
>  plot(table(N), lwd = 4)
>
> Histograms should be only be used for continuous data (or discrete data
> with "many" possible values).
>
> It's a pain to see them so often "misused" for data like the 'N' above.
>
> Martin Maechler,
> ETH Zurich
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list