# [R] Trimming time series to only include complete years

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Tue May 31 00:15:54 CEST 2016

Sorry, I put too many bugs (opportunities for excellence!) in this on my
first pass on this to leave it alone :-(

isPartialWaterYear2 <- function( d ) {
dtl <- as.POSIXlt( d )
wy1 <- cumsum( ( 9 == dtl\$mon ) & ( 1 == dtl\$mday ) )
# any 0 in wy1 corresponds to first partial water year
result <- 0 == wy1
# if last day is not Sep 30, mark last water year as partial
if ( 8 != dtl\$mon[ length( d ) ]
| 30 != dtl\$mday[ length( d ) ] ) {
result[ wy1[ length( d ) ] == wy1 ] <- TRUE
}
result
}

dat2 <- dat[ !isPartialWaterYear( dat\$Date ), ]

On Sat, 28 May 2016, Jeff Newmiller wrote:

> # note that the "mon" element is 0-11
> isPartialWaterYear <- function( d ) {
>  dtl <- as.POSIXlt( dat\$Date )
>  wy1 <- cumsum( ( 9 == dtl\$mon ) & ( 1 == dtl\$mday ) )
>  ( 0 == wy1  # first partial year
>  | (  8 != dtl\$mon[ nrow( dat ) ] # end partial year
>    & 30 != dtl\$mday[ nrow( dat ) ]
>    ) & wy1[ nrow( dat ) ] == wy1
>  )
> }
>
> dat2 <- dat[ !isPartialWaterYear( dat\$Date ), ]
>
> The above assumes that, as you said, the data are continuous at one-day
> intervals, such that the only partial years will occur at the beginning and
> end. The "diff" function could be used to identify irregular data within the
> data interval if needed.
>
> On Fri, 27 May 2016, Morway, Eric wrote:
>
>> In bulk processing streamflow data available from an online database, I'm
>> wanting to trim the beginning and end of the time series so that daily data
>> associated with incomplete "water years" (defined as extending from Oct 1st
>> to the following September 30th) is trimmed off the beginning and end of
>> the series.
>>
>> For a small reproducible example, the time series below starts on
>> 2010-01-01 and ends on 2011-11-05.  So the data between 2010-01-01 and
>> 2010-09-30 and also between 2011-10-01 and 2011-11-05 is not associated
>> with a complete set of data for their respective water years.  With the
>> real data, the initial date of collection is arbitrary, could be 1901 or
>> 1938, etc.  Because I'm cycling through potentially thousands of records, I
>> need help in designing a function that is efficient.
>>
>> dat <-
>> data.frame(Date=seq(as.Date("2010-01-01"),as.Date("2011-11-05"),by="day"))
>> dat\$Q <- rnorm(nrow(dat))
>>
>> dat\$wyr <- as.numeric(format(dat\$Date,"%Y"))
>> is.nxt <- as.numeric(format(dat\$Date,"%m")) %in% 1:9
>> dat\$wyr[!is.nxt] <- dat\$wyr[!is.nxt] + 1
>>
>>
>> function(dat) {
>>   ...
>>   returns a subset of dat such that dat\$Date > xxxx-09-30 & dat\$Date <
>> yyyy-10-01
>>   ...
>> }
>>
>> where the years between xxxx-yyyy are "complete" (no missing days).  In the
>> example above, the returned dat would extend from 2010-10-01 to 2011-09-30
>>
>> Any offered guidance is very much appreciated.
>>
>> 	[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ---------------------------------------------------------------------------
> Jeff Newmiller                        The     .....       .....  Go Live...
> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
>                                      Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help