[Rd] cut.Date and cut.POSIXt problem

Roger D. Peng rpeng at jhsph.edu
Wed Mar 26 14:18:19 CET 2008


I have applied these patches to R-devel and in my limited testing they appear to 
work as desired.  I have to say that I never ran into the problem these patches 
were meant to solve so I may not be the best person to do the testing.

-roger

Marc Schwartz wrote:
> Hi all,
> 
> Apologies for the delay in my engaging in this thread. I was traveling 
> this week.
> 
> The problem that Gabor raised was caused by the patch that I submitted 
> to fix a problem with the referenced functions when using 'months' and 
> 'years' as the interval. The prior versions were problematic:
> 
>   https://stat.ethz.ch/pipermail/r-devel/2008-January/048004.html
> 
> The patch fixed the error, but since I used hist.Date() as the reference 
> model and did not note the subtle difference in cut.Date() relative to 
> specifying the breaks increment value, this functionality was lost when 
> the same modification was made to the code in cut.Date().
> 
> Roger's patch helps, but does not totally remedy the situation. One also 
> needs to modify the method used for specifying the max value 'end' for 
> the breaks in order to include the max 'x' Date value in the result.
> 
> Hence, I am attaching proposed patches against R-devel for 
> base:::dates.R and base:::datetime.R.
> 
> I am also attaching a patch for tests:::reg-tests-1.R to add a check for 
> this situation to the regression tests that were also added subsequent 
> to that prior set of patches that I had submitted.
> 
> If perhaps Roger and Gabor could so some testing on these patches before 
> they are considered for inclusion into the R-devel tree, it would be 
> helpful to check to see if I have missed something else here.
> 
> Thanks for raising this issue.
> 
> Regards,
> 
> Marc Schwartz
> 
> Roger D. Peng wrote:
>> Seems changes in r44116 force the interval to be single months (or 
>> years) instead of whatever the user specified.  I think the attached 
>> patches correct this.
>>
>> Interestingly, 'cut' and 'seq' allow for the 'breaks' specification to 
>> be something like "3 months" but the documentation for 'hist' does not 
>> allow for this type of specification.
>>
>> -roger
>>
>> Gabor Grothendieck wrote:
>>> cut.Date and cut.POSIXt indicate that the breaks argument
>>> can be an integer followed by a space followed by "year", etc.
>>> but it seems the integer is ignored.
>>>
>>> For example, I assume that breaks = "3 months" is supposed
>>> to cut it into quarters but, in fact, it cuts it into months as if
>>> 3 had not been there.
>>>
>>>> d <- seq(Sys.Date(), length = 12, by = "month")
>>>> cut(d, "3 months")
>>>  [1] 2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01 2008-08-01
>>> 2008-09-01 2008-10-01 2008-11-01 2008-12-01 2009-01-01 2009-02-01
>>> Levels: 2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01
>>> 2008-08-01 2008-09-01 2008-10-01 2008-11-01 2008-12-01 2009-01-01
>>> 2009-02-01
>>>> cut(as.POSIXct(d), "3 months")
>>>  [1] 2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01 2008-08-01
>>> 2008-09-01 2008-10-01 2008-11-01 2008-12-01 2009-01-01 2009-02-01
>>> Levels: 2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01
>>> 2008-08-01 2008-09-01 2008-10-01 2008-11-01 2008-12-01 2009-01-01
>>> 2009-02-01
>>>> cut(as.POSIXlt(d), "3 months")
>>>  [1] 2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01 2008-08-01
>>> 2008-09-01 2008-10-01 2008-11-01 2008-12-01 2009-01-01 2009-02-01
>>> Levels: 2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01
>>> 2008-08-01 2008-09-01 2008-10-01 2008-11-01 2008-12-01 2009-01-01
>>> 2009-02-01
>>>
> 
> ------------------------------------------------------------------------
> 
> --- datesORIG.R	2008-03-20 14:25:13.000000000 -0500
> +++ dates.R	2008-03-20 14:38:21.000000000 -0500
> @@ -322,17 +322,19 @@
>  	if(valid == 3) {
>          start$mday <- 1
>          end <- as.POSIXlt(max(x, na.rm = TRUE))
> -        end <- as.POSIXlt(end + (31 * 86400))
> +        step <- ifelse(length(by2) == 2, as.integer(by2[1]), 1)
> +        end <- as.POSIXlt(end + (31 * step * 86400))
>          end$mday <- 1
> -        breaks <- as.Date(seq(start, end, "months"))
> +        breaks <- as.Date(seq(start, end, breaks))
>      } else if(valid == 4) {
>          start$mon <- 0
>          start$mday <- 1
>          end <- as.POSIXlt(max(x, na.rm = TRUE))
> -        end <- as.POSIXlt(end + (366 * 86400))
> +        step <- ifelse(length(by2) == 2, as.integer(by2[1]), 1)
> +        end <- as.POSIXlt(end + (366 * step * 86400))
>          end$mon <- 0
>          end$mday <- 1
> -        breaks <- as.Date(seq(start, end, "years"))
> +        breaks <- as.Date(seq(start, end, breaks))
>      } else {
>          start <- .Internal(POSIXlt2Date(start))
>          if (length(by2) == 2) incr <- incr * as.integer(by2[1])
> 
> 
> ------------------------------------------------------------------------
> 
> --- datetimeORIG.R	2008-03-20 14:25:20.000000000 -0500
> +++ datetime.R	2008-03-20 15:25:49.000000000 -0500
> @@ -727,17 +727,19 @@
>  	if(valid == 6) {
>          start$mday <- 1
>          end <- as.POSIXlt(max(x, na.rm = TRUE))
> -        end <- as.POSIXlt(end + (31 * 86400))
> +        step <- ifelse(length(by2) == 2, as.integer(by2[1]), 1)
> +        end <- as.POSIXlt(end + (31 * step * 86400))
>          end$mday <- 1
> -        breaks <- seq(start, end, "months")
> +        breaks <- seq(start, end, breaks)
>      } else if(valid == 7) {
>          start$mon <- 0
>          start$mday <- 1
>          end <- as.POSIXlt(max(x, na.rm = TRUE))
> -        end <- as.POSIXlt(end + (366 * 86400))
> +        step <- ifelse(length(by2) == 2, as.integer(by2[1]), 1)
> +        end <- as.POSIXlt(end + (366 * step* 86400))
>          end$mon <- 0
>          end$mday <- 1
> -        breaks <- seq(start, end, "years")
> +        breaks <- seq(start, end, breaks)
>      } else {
>          if (length(by2) == 2) incr <- incr * as.integer(by2[1])
>  	    maxx <- max(x, na.rm = TRUE)
> 
> 
> ------------------------------------------------------------------------
> 
> --- reg-tests-1ORIG.R	2008-03-20 09:18:19.000000000 -0500
> +++ reg-tests-1.R	2008-03-20 15:15:56.000000000 -0500
> @@ -5025,7 +5025,7 @@
>  ## was about 0.0005 in 2.6.1 patched
>  
>  
> -## tests of problems fixed by Marc Schwarz's patch for
> +## tests of problems fixed by Marc Schwartz's patch for
>  ## cut/hist for Dates and POSIXt
>  Dates <- seq(as.Date("2005/01/01"), as.Date("2009/01/01"), "day")
>  months <- format(Dates, format = "%m")
> @@ -5036,20 +5036,32 @@
>  stopifnot(identical(hist(Dates, "month", plot = FALSE)$counts, mn))
>  # Test cut.Date() for months
>  stopifnot(identical(as.vector(table(cut(Dates, "month"))), mn))
> +# Test cut.Date() for 3 months
> +stopifnot(identical(as.vector(table(cut(Dates, "3 months"))),
> +                    as.integer(colSums(matrix(c(mn, 0, 0), nrow = 3)))))
>  # Test hist.Date() for years
>  stopifnot(identical(hist(Dates, "year", plot = FALSE)$counts, ty))
>  # Test cut.Date() for years
>  stopifnot(identical(as.vector(table(cut(Dates, "years"))),ty))
> +# Test cut.Date() for 3 years
> +stopifnot(identical(as.vector(table(cut(Dates, "3 years"))),
> +                    as.integer(colSums(matrix(c(ty, 0), nrow = 3)))))
>  
>  Dtimes <- as.POSIXlt(Dates)
>  # Test hist.POSIXt() for months
>  stopifnot(identical(hist(Dtimes, "month", plot = FALSE)$counts, mn))
>  # Test cut.POSIXt() for months
>  stopifnot(identical(as.vector(table(cut(Dtimes, "month"))), mn))
> +# Test cut.POSIXt() for 3 months
> +stopifnot(identical(as.vector(table(cut(Dtimes, "3 months"))),
> +                    as.integer(colSums(matrix(c(mn, 0, 0), nrow = 3)))))
>  # Test hist.POSIXt() for years
>  stopifnot(identical(hist(Dtimes, "year", plot = FALSE)$counts, ty))
>  # Test cut.POSIXt() for years
>  stopifnot(identical(as.vector(table(cut(Dtimes, "years"))), ty))
> +# Test cut.POSIXt() for 3 years
> +stopifnot(identical(as.vector(table(cut(Dtimes, "3 years"))),
> +                    as.integer(colSums(matrix(c(ty, 0), nrow = 3)))))
>  ## changed in 2.6.2
>  
>  

-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/



More information about the R-devel mailing list