[Rd] cut takes long time

Deepayan Sarkar deepayan.sarkar at gmail.com
Thu Jun 17 07:50:02 CEST 2010


On Wed, Jun 16, 2010 at 3:56 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> The following cut command takes nearly 10 seconds on my machine even
> though the length of input vector is only 6.  I am running on Windows
> Vista with C2D BLAS using R 2.11.1.  Using the default BLAS and either
> R 2.10.1 or "R version 2.12.0 Under development (unstable) (2010-05-31
> r52164)" also gives me results in the 9-11 second range.
> I would have expected it to take much less time.
>
>
> tt <- structure(c(631206000, 631206060, 631206180, 631206240, 631206300,
> 978224400), class = c("POSIXt", "POSIXct"), tzone = "")
>
> system.time(cut(tt, "2 hours", include = TRUE)) # 9.45  0.01  9.58

The POSIXt aspect is not relevant to this, it's the number of breakpoints.

> system.time(cut(tt, "2 hours", include = TRUE))
   user  system elapsed
  5.884   0.108   6.033
> system.time(cut(rnorm(6), breaks = 50000))
   user  system elapsed
  5.200   0.000   5.558

And the time seems linear in the number of breakpoints, which is not
surprising. The "Note" section in ?cut does mention more efficient
alternatives.

Note that

> system.time(cut(tt, "2 hours", include = TRUE, labels = FALSE))
   user  system elapsed
   0.02    0.00    0.02

so it's the conversion to factors that seems to take most of the time.

-Deepayan



More information about the R-devel mailing list