[R] Problem with cut

jim holtman jholtman at gmail.com
Sat Feb 23 00:58:29 CET 2008


One way of finding out is to look at the code for cut.default.  Here
is the result of tracing through it where it determines where the cuts
are for 12 equal spacings:

D(2)>
 [1] 149.804 166.170 182.536 198.902 215.268 231.634 248.000 264.366
280.732 297.098 313.464 329.830
[13] 346.196

As you can see one of the breakpoints is at 329.830 that is why 330 is
in the (330,346] category.  The statements in the function that do
this are:

    if (length(breaks) == 1) {
        if (is.na(breaks) | breaks < 2)
            stop("invalid number of intervals")
        nb <- as.integer(breaks + 1)
        dx <- diff(rx <- range(x, na.rm = TRUE))
        if (dx == 0)
            dx <- abs(rx[1])
        breaks <- seq.int(rx[1] - dx/1000, rx[2] + dx/1000, length.out = nb)
    }

You can see there is a small fudge factor applied to both ends to make
sure all the data is included.  That is what causes the perceived
problem.

On Fri, Feb 22, 2008 at 8:21 AM,  <Jussi.Lehto at ubs.com> wrote:
> Hi All,
>
> I might misunderstood how cut works. But following behaviour surprises
> me.
>
> vv <- seq(150, 346, by= 4)
> cc <- cut(vv, 12)
> cc[vv == 330]
> Results [1] (330,346]
>
> I would have expected 330 to fall into (313,330] category.
>
> Can you please advice what do I do wrong?
>
> Many Thanks,
> Jussi Lehto
>
> Visit our website at http://www.ubs.com
>
> This message contains confidential information and is ...{{dropped:20}}



More information about the R-help mailing list