[Rd] Bug: floating point bug in nclass.FD can cause hist() to crash

Sietse Brouwer sbbrouwer at gmail.com
Thu May 18 22:50:52 CEST 2017

Hello everybody,

This is a bug involving functions in core R package:
graphics::hist.default, grDevices::nclass.FD, and
base::pretty.default. It is not yet on Bugzilla. I cannot submit it
myself, as I do not have an account. Could somebody else add it for
me, perhaps? That would be much appreciated.

Kind regards,

Sietse Brouwer


Floating point errors can cause a data vector to have an ultra-small
inter-quartile range, which causes `grDevices::nclass.FD` to suggest
an absurdly large number of breaks to `graphics::hist(breaks="FD")`.
Because this large float becomes NA when converted to integer, hist's
call to `base::pretty` crashes.

How could nclass.FD, which has the job of suggesting a reasonable number of
classes, avoid suggesting an absurdly large number of classes when the
inter-quartile range is absurdly small compared to the range?

Steps to reproduce

    hist(c(1, 1, 1, 1 + 1e-15, 2), breaks="FD")

Observed behaviour

Running this code gives the following error message:

    Error in pretty.default(range(x), n = breaks, min.n = 1):
      invalid 'n' argument
    In addition: Warning message:
    In pretty.default(range(x), n = breaks, min.n = 1) :
      NAs introduced by coercion to integer range

Expected behaviour

That hist() should never crash when given valid numerical data. Specifically,
that it should be robust even to those rare datasets where (through floating
point inaccuracy) the inter-quartile range is tens of orders of magnitude
smaller than the range.


Dramatis personae:

* graphics::hist.default

* grDevices::nclass.FD

* base::pretty.default

`nclass.FD` examines the inter-quartile range of `x`, and gets a positive, but
very small floating point value -- let's call it TINYFLOAT. It inserts this
ultra-low IQR into the `nclass` denominator, which means `nclass`
becoms a huge number -- let's call it BIGFLOAT. `nclass.FD` then returns this
huge value to `hist`.

Once `hist` has its 'number of breaks' suggestion, it feeds this
number to `pretty`:

    pretty(range(x), BIGFLOAT, min.n = 1)

`pretty`, in turn, calls

    .Internal(pretty(min(x), max(x), BIGFLOAT, min.n, shrink.sml,
        c(high.u.bias, u5.bias), eps.correct))

Which fails with the error and warning shown at start of this e-mail. (Invalid
'n' argument / NA's introduced by coercion to integer range.) My reading is
that .Internal tried to coerce BIGFLOAT to integer range and produced an NA,
and that (the C implementation of) `pretty`, in turn, choked when confronted
with NA.

