[R] subset and as.POSIXct / as.POSIXlt oddness

Gabor Grothendieck ggrothendieck at gmail.com
Thu Mar 24 17:09:53 CET 2011


On Thu, Mar 24, 2011 at 9:29 AM, Michael Bach <phaebz at gmail.com> wrote:
> Dear R users,
>
> Given this data:
>
> x <- seq(1,100,1)
> dx <- as.POSIXct(x*900, origin="2007-06-01 00:00:00")
> dfx <- data.frame(dx)
>
> Now to play around for example:
>
> subset(dfx, dx > as.POSIXct("2007-06-01 16:00:00"))
>
> Ok. Now for some reason I want to extract the datapoints between hours
> 10:00:00 and 14:00:00, so I thought well:
>
> subset(dfx, dx > as.POSIXct("2007-06-01 16:00:00"), 14 > as.POSIXlt(dx)$hour
> & as.POSIXlt(dx)$hour < 10)
> Error in as.POSIXlt.numeric(dx) : 'origin' must be supplied
>
> Well that did not work. But why does the following work?
>
> 14 > as.POSIXlt(dx)$hour & as.POSIXlt(dx)$hour < 10
>
> Is there something I miss about subset()? Or is there even another way of
> aggregating over an hourly time interval in a nicer way?
>

Here is yet another solution:

hr <- function(x) as.numeric(format(x, "%H"))
subset(dfx, as.Date(dx) > "2007-06-01" & hr(dx) > 10 & hr(dx) < 14)

Although that seems to be what you asked for perhaps you really meant
to include 10:00 and 14:00.  In that case, since we have data at a
granularity of one minute try this:

hhmm <- function(x) as.numeric(format(x, "%H%M"))
subset(dfx, as.Date(dx) > "2007-06-01" & hhmm(dx) >= 1000 & hhmm(dx) <= 1400)

Note that the above calculate days and hours relative to the current
time zone.  Since your data seems not to have time zones you may be
better off using chron rather than POSIXct to avoid potential time
zone errors.   In that case see R News 4/1 and its references and note
the availability of the hours() and related functions.


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list