[R] ggplot2 histograms... a subtle error found

Brian Diggs diggsb at ohsu.edu
Mon Aug 2 22:41:39 CEST 2010


On 7/28/2010 5:04 PM, Mike Williamson wrote:
> Hello all,
>
>      I have a peculiar and particular bug that I stumbled across with
> ggplot2.  I cannot seem to replicate it with anything other than my specific
> data set.
>
>      Here is the problem:
>
>     - when I try to plot a histogram, allowing for ggplot2 to decide the
>     binwidths itself, I get the following error:
>        - stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to
>        adjust this.
>        - Error: position_stack requires constant width
>
>      My code is simply:
>
> ggplot(data=myDataSet, aes(x=myVarOI)) + geom_histogram()
>
> or
>
> qplot(myDataSet$myVarOI)
>
>      If I go ahead and set the binwidth to some value, then the plot can be
> made without problems.
>
>      The problem is with the specific data that it is trying to plot.  I
> suspect it is trying to create bins of different sizes, from the error
> code.  Here are the basics of my data set:
>
>     - length:  1936 entries
>     - 1906 unique entries
>     - stats:
>     -      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
>     3.200e+09 6.312e+09 6.591e+09 6.874e+09 7.551e+09 1.083e+10
>
>
>
>      I cannot imagine this can be solved without my specifically uploading
> the actual data.  If I simply attach it, will it be received by r-help?
> Hadley, if you're interested, would you like me to send you the data
> directly to you?

I can reproduce it with generic data.  The problem is one of underflow.

ggplot(data=mtcars, aes(x=6.8e+09 + qsec)) + geom_histogram()
#stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust 
this.
#Error during wrapup: position_stack requires constant width

When ggplot2 verifies the widths before stacking (the default position 
for histograms), it computes the widths from the minimum and maximum 
values for each bin.  However, because the width of the bins (0.28) is 
much smaller than the scale of the edges (6.8e+09), there is some 
underflow and the widths don't all come out equal:

# in ggplot2::collide
with(data, xmax-xmin)
# [1] 0.2799988 0.2799988 0.2800007 0.2799988 0.2799988 0.2799988 
#0.2800007 0.2799988 0.2799988
#[10] 0.2799988 0.2800007 0.2799988 0.2799988 0.2799988 0.2800007 
#0.2799988 0.2799988 0.2800007
#[19] 0.2799988 0.2799988 0.2799988 0.2800007 0.2799988 0.2799988 
#0.2799988 0.2800007 0.2799988
#[28] 0.2799988 0.2799988 0.2800007 0.2799988 0.2799988

unique(with(data, xmax - xmin))
#[1] 0.2799988 0.2800007

So ggplot2 concludes the widths are not equal and gives the error you 
see.  I don't think this is a bug; you are operating at the edge of what 
the floating point precision will allow, and seem to have crossed that 
edge in this case.  (I suppose ggplot2 could carry the information that 
the bins are created with equal widths and then not have to check that 
later, but that seems unnecessary overhead.)

There is a workaround, though.

ggplot(data=mtcars, aes(x=6.8e+09 + qsec)) + 
geom_histogram(position="identity")

gives what you want and does not require the widths to be equal.  If you 
had more than one group, position="stack" and position="identity" are 
quite different, but they are equivalent for one group and so you can 
get away switching one for the other in this case.

>                                            Regards,
>                                                   Mike

--
Brian Diggs
Senior Research Associate, Department of Surgery, Oregon Health & 
Science University



More information about the R-help mailing list