[Rd] Determining the break points by hist() leads to errors (PR#2432)

Peter Dalgaard BSA p.dalgaard@biostat.ku.dk
Wed Jan 8 19:59:02 2003


volker.franz@tuebingen.mpg.de writes:

> Hi,
> 
> if I dermine the break points using the hist() function and then try
> to re-use these in a new histogram, R fails. Here is an example of the
> problem:
> 
> ##First, plot a histogram: 
> data(islands)
> foo <- hist(islands,freq=T)
> 
> ##Now, try plot it again, with the previously determined break points:
> hist(islands,breaks=foo$breaks,freq=T)
> 
> ##... this lead to the warning message:
> Warning message: 
> the AREAS in the plot are wrong -- rather use `freq=FALSE'!
> in: plot.histogram(r, freq = freq, col = col, border = border, angle = 
> 
> ##The reason for this seems to be, that the breaks are NOT 
> ##equidistant (despite foo$equidist being TRUE!):
> 
> > foo$breaks
> [1]    -0.0018  2000.0018  4000.0018  6000.0018  8000.0018 10000.0018
> [7] 12000.0018 14000.0018 16000.0018 18000.0018
> 
> ##Correcting this (by changing the first element of foo$breaks):
> corr.breaks <- c(+0.0018,2000.0018,4000.0018,6000.0018,8000.0018,
>                  10000.0018,12000.0018,14000.0018,16000.0018,18000.0018)
> 
> ##...leads to the desired result:
> hist(islands,breaks=corr.breaks,freq=T)
> 

...for your data. There's a reason why the first breakpoint is
adjusted in the opposite direction, namely to get exact zeros counted
into the first bin. Of course since x in theory has a continuous
distribution, you in theory don't have observations on the boundary,
but in practice, theory and practice is not the same.

So the proper fix would be different. Currently we have

    h <- diff(breaks)
    equidist <- !use.br || diff(range(h)) < 1e-07 * mean(h)

which likely needs a larger tolerance since 

    diddle <- 1e-07 * max(abs(range(breaks)))

goes in both directions.

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)             FAX: (+45) 35327907