[Rd] [bug] in cut.POSIXt(..., breaks = <numeric>)

Xianghui Dong xhdong at umd.edu
Thu Apr 6 21:37:16 CEST 2017


The exact error was reported before in *Bug 14288*
<https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14288> *- **bug in
cut.POSIXt(..., breaks = <numeric>) and cut.Date. *But the fix in that bug
report only covered the simplest case.

This is the error I met
-----------------------------

x <- structure(c(1057067700, 1057215720, 1060597800, 1061470800,
1061911680,
1062048000, 1062137880, 1064479440, 1064926380, 1064995140, 1066822800,
          1068033720, 1070869740, 1070939820, 1071030540, 1074244560,
1077545880,
          1078449720, 1084955460, 1129020000, 1130324280, 1130404800,
1131519420,
          1132640100, 1133772000, 1137567960, 1138952640, 1141810380,
1147444200,
          1161643440, 1164086160), class = c("POSIXct", "POSIXt"), tzone =
"UTC")

> cut(x, 20)
Error in `levels<-.factor`(`*tmp*`, value = as.character(if
(is.numeric(breaks)) x[!duplicated(res)] else breaks[-length(breaks)])) :
  number of levels differs
-----------------------------

The cause of the bug is that the input have spread out date-time values,
only 10 breaks in the total 20 breaks have value.
-------------------

cut_n <- cut(as.numeric(x), 20)

> unique(cut_n)
 [1] (1.057e+09,1.062e+09] (1.062e+09,1.068e+09] (1.068e+09,1.073e+09]
(1.073e+09,1.078e+09]
 [5] (1.084e+09,1.089e+09] (1.127e+09,1.132e+09] (1.132e+09,1.137e+09]
(1.137e+09,1.143e+09]
 [9] (1.143e+09,1.148e+09] (1.159e+09,1.164e+09]
20 Levels: (1.057e+09,1.062e+09] (1.062e+09,1.068e+09]
(1.068e+09,1.073e+09] ... (1.159e+09,1.164e+09]
------------------------
To get proper 20 labels of each break, the break need to be formatted from
number to date-time string. Current code didn't really convert the breaks
However the code just used the original date-time values from input data.
This will not work if the interval value doesn't happen to equal to
original input. For a even simpler example from the original bug report:
-----------------------
x <- seq(as.POSIXct("2000-01-01"), by = "days", length = 20)
> cut(x, breaks = 30)
Error in `levels<-.factor`(`*tmp*`, value = as.character(if
(is.numeric(breaks)) x[!duplicated(res)] else breaks[-length(breaks)])) :
  number of levels differs
---------------------

I think to fix the bug will need either
- get the actual numeric value of the breaks from "cut", modify "cut" if
needed. Then convert the numeric value back to date-time
- or use regex to extract the break value then convert to date-time

Best,
Xianghui Dong

	[[alternative HTML version deleted]]



More information about the R-devel mailing list