[R] cut.POSIXt misconception/feature/bug?

Brian Diggs diggsb at ohsu.edu
Wed Mar 10 20:47:49 CET 2010


On 3/10/2010 1:01 AM, Petr PIKAL wrote:
> Dear all
> recently I tried to split vector of dates according to some particular 
> date to 2 (more) chunks, but I was not able to perform correct setting.
> 
> When I want split to 3 chunks it partially works however from help page I 
> supposed to get result without NA.
> 
> Details:
> 
>      Using both ‘right = TRUE’ and ‘include.lowest = TRUE’ will
>      include both ends of the range of dates.
> 
> dat <- seq(c(ISOdate(2000,3,20)), by = "day", length.out = 60)
> br<-dat[c(23, 42)]
> head(cut(dat, breaks=br, right=T, include.lowest=T))
> 
> [1] <NA> <NA> <NA> <NA> <NA> <NA>
> Levels: 2000-04-11 14:00:00
> 
> which apparently is not output I would like to have.

The breaks argument does not work the way you think it does.  To get n groups, you need n+1 breaks.  That is, an data outside the range of your breakpoints will be set to NA.  To make sure all the data is included, your breaks must include the extreme values of what you are cutting.

br <- dat[c(1,23,42,60)]
cut(dat, breaks=br, right=TRUE, include.lowest=TRUE)
# [1] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
# [4] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
# [7] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[10] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[13] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[16] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[19] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[22] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-04-11 05:00:00
#[25] 2000-04-11 05:00:00 2000-04-11 05:00:00 2000-04-11 05:00:00
#[28] 2000-04-11 05:00:00 2000-04-11 05:00:00 2000-04-11 05:00:00
#[31] 2000-04-11 05:00:00 2000-04-11 05:00:00 2000-04-11 05:00:00
#[34] 2000-04-11 05:00:00 2000-04-11 05:00:00 2000-04-11 05:00:00
#[37] 2000-04-11 05:00:00 2000-04-11 05:00:00 2000-04-11 05:00:00
#[40] 2000-04-11 05:00:00 2000-04-11 05:00:00 2000-04-11 05:00:00
#[43] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
#[46] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
#[49] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
#[52] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
#[55] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
#[58] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
#Levels: 2000-03-20 04:00:00 2000-04-11 05:00:00 2000-04-30 05:00:00

> When trying to split to 2 chunks there is a strange error
> 
> br<-dat[42]
> cut(dat, breaks=br, right=T, include.lowest=T)
> Error in cut.default(unclass(x), unclass(breaks), labels = labels, right = 
> right,  :  cannot allocate vector of length 955454401

To get 2 chunks, you need 3 breaks

br <- dat[c(1,42,60)]
cut(dat, breaks=br, right=TRUE, include.lowest=TRUE)
# [1] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
# [4] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
# [7] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[10] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[13] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[16] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[19] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[22] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[25] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[28] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[31] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[34] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[37] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[40] 2000-03-20 04:00:00 2000-03-20 04:00:00 2000-03-20 04:00:00
#[43] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
#[46] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
#[49] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
#[52] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
#[55] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
#[58] 2000-04-30 05:00:00 2000-04-30 05:00:00 2000-04-30 05:00:00
#Levels: 2000-03-20 04:00:00 2000-04-30 05:00:00

> I traced it back to 
> 
> Browse[5]> nb
> [1] 955454401
> ^^^^^^^^^^^^^^^^^^^^^^
> Browse[5]> 
> debug: NULL
> Browse[5]> 
> debug: breaks <- seq.int(rx[1L] - dx/1000, rx[2L] + dx/1000, length.out = 
> nb)
> Browse[5]> 
> Error in cut.default(unclass(x), unclass(breaks), labels = labels, right = 
> right,  : 
>   cannot allocate vector of length 955454401
> 
> which is probably not correct.

If you give breaks a single number, it is interpreted as the "number giving the number of intervals which x is to be cut into."  Since you need one more break than groups, a break of length 1 is not meaningful, so it was overloaded to mean the number of groups wanted in the end.  As you saw, nb as an integer was 955454401, so cut.POSIXt assumed you wanted 955454401 evenly spaced groups, and that was too large to allocated which gave the error you saw.

> Can somebody help me to the right track?
> 
> 
>> version
>                _  
> platform       i386-pc-mingw32  
> arch           i386  
> os             mingw32  
> system         i386, mingw32  
> status         Under development (unstable)  
> major          2  
> minor          11.0  
> year           2010  
> month          03  
> day            09  
> svn rev        51229  
> language       R  
> version.string R version 2.11.0 Under development (unstable) (2010-03-09 
> r51229)
> 
> Regards
> Petr


--
Brian Diggs, Ph.D.
Senior Research Associate, Department of Surgery, Oregon Health & Science University






More information about the R-help mailing list