[R] cut POSIX results in NA - bug?

Petr Pikal petr.pikal at precheza.cz
Wed Nov 3 16:51:01 CET 2004


Dear prof. Ripley

Thank you very much for explanation (without  it I would not 
consider include.lowest has something to do with my observation). 
I changed my code to get rid of single final POSIXdates.

BTW there is no mention in cut.POSIXt help page about 
include.lowest and  I think that in case of dates it does something 
what is maybe not so *understandable* (61 minutes in one hour). 

datum<-seq(ISOdate(2004,8,31), ISOdate(2004,9,1), "min")

# part of a datum variable
datum[1379:1381]
[1] "2004-09-01 12:58:00 Støedn\355 Evropa (letn\355 èas)"   
"2004-09-01 12:59:00 Støedn\355 Evropa (letn\355 èas)"  
[3] "2004-09-01 13:00:00 Støedn\355 Evropa (letn\355 èas)"  
>

# the last item seems to me to belong to time from 13:00:00 to 
13:59:00 e.g. it is part of thirteen's hour of a day

cut(datum[1370:1381],"hour", include.lowest=T) 
# it will include it to previous hour

 [1] 2004-09-01 12:00:00 2004-09-01 12:00:00 2004-09-01 
12:00:00 2004-09-01 12:00:00 2004-09-01 12:00:00 2004-09-01 
12:00:00
 [7] 2004-09-01 12:00:00 2004-09-01 12:00:00 2004-09-01 
12:00:00 2004-09-01 12:00:00 2004-09-01 12:00:00 2004-09-01 
12:00:00
Levels: 2004-09-01 12:00:00

 cut(datum[1370:1381],"hour")
# this will drop it from result, correct but unfortunate 

 [1] 2004-09-01 12:00:00 2004-09-01 12:00:00 2004-09-01 
12:00:00 2004-09-01 12:00:00 2004-09-01 12:00:00 2004-09-01 
12:00:00
 [7] 2004-09-01 12:00:00 2004-09-01 12:00:00 2004-09-01 
12:00:00 2004-09-01 12:00:00 2004-09-01 12:00:00 <NA>               
Levels: 2004-09-01 12:00:00

# so as a result an hour can have 61 minutes
levels(cut(datum[1321:1381],"hour", include.lowest=T))
[1] "2004-09-01 12:00:00"

length(cut(datum[1321:1381],"hour", include.lowest=T)) #???
[1] 61

Is it correct?

Thank you again.

Best regards
Petr Pikal


On 3 Nov 2004 at 11:20, Prof Brian Ripley wrote:

> On Wed, 3 Nov 2004, Petr Pikal wrote:
> 
> > Dear all
> > 
> > I try to make hourly average by cut() function, which almost works
> > as *I* expected. What puzled me is that if there is only one item at
> > the end of your data it results in NA.
> > 
> > Example will explain what I mean
> > 
> > datum<-seq(ISOdate(2004,8,31), ISOdate(2004,9,1), "min")
> > 
> > cut(datum[1370:1381],"hour", labels=F)
> >  [1]  1  1  1  1  1  1  1  1  1  1  1 NA
> > 
> > cut(datum[1370:1382],"hour", labels=F)
> >  [1] 1 1 1 1 1 1 1 1 1 1 1 2 2
> > 
> > I do not understand why the last item in first call is NA. I found
> > it only when there was a switch from DST to standard time as it
> > coused a trouble in one of my functions and I found there is NA
> > value where I did not expected it. 
> 
> cut(datum[1370:1381],"hour", labels=F, include.lowest=T)
> 
> is what you need.  See ?cut, in the See Also, which says
> 
> include.lowest: logical, indicating if an 'x[i]' equal to the lowest
>           (or highest, for 'right = FALSE') 'breaks' value should be
>           included.
> 
> > I can make some workaround but can you please explain me why 
> > first call results in NA value at the end of a vector and if it is
> > *intended* behaviour. 
> 
> It is the documented behaviour, for better or for worse.
> 
> -- 
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self) 1 South
> Parks Road,                     +44 1865 272866 (PA) Oxford OX1 3TG,
> UK                Fax:  +44 1865 272595
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

Petr Pikal
petr.pikal at precheza.cz




More information about the R-help mailing list