Adams, Jean
jvadams at usgs.gov
Thu May 7 12:32:36 CEST 2015
Two libraries are needed to run the code you submitted ...
library(dplyr)
library(sqldf)
Your IsHigh() function and its use can be replaced by a single line of code
isHighFlow <- as.numeric(Flow>=1600)
You are getting the additional hour by using cumsum(). One date element
which you seem to characterize as zero hours returns a one in cumsum, two
returns two, etc.
cumsum(c(1, 0, 1, 1, 0, 1, 1, 1, 0))
If everything is off by one hour, just subtract a 1. Problem solved.
Jean
On Wed, May 6, 2015 at 5:55 PM, jcrosbie <james at crosb.ie> wrote:
> I'm trying to study times in which flow was operating at a given level or
> greater. To do so I have created a way to see how long the series has
> operated at a high level. But for some reason the data is calculating the
> runs one hour to long. Any ideas on why?
>
>
>
>
>
> Code:
> Date<-format(seq(as.POSIXct("2014-01-01 01:00"), as.POSIXct("2015-01-01
> 00:00"), by="hour"), "%Y-%m-%d %H:%M", usetz = FALSE)
> Flow<-runif(8760, 0, 2300)
>
> IsHigh<- function(x ){
> if (x < 1600) return(0)
> if (1600 <= x) return(1)
> }
>
> isHighFlow = unlist(lapply(Flow, IsHigh))
>
> df = data.frame(Date, Flow, isHighFlow )
>
>
> temp <- df %>%
> mutate(highFlowInterval = cumsum(isHighFlow==0)) %>%
> group_by(highFlowInterval) %>%
> summarise(hoursHighFlow = n(), minDate = min(as.character(Date)), maxDate
> = max(as.character(Date)))
>
> #Then join the two tables together.
> temp2<-sqldf("SELECT *
> FROM temp LEFT JOIN df
> ON df.Date BETWEEN temp.minDate AND temp.maxDate")
>
>
>
