[R] removing data based on date pairs in a separate data frame

William Dunlap wdunlap at tibco.com
Mon Feb 29 19:44:49 CET 2016


If your start/end pairs are not overlapping you can use findInterval() to
do this
pretty quickly.  E.g.,
isInABound <- function (x, low, high)
{
    stopifnot(length(low) == length(high))
    bounds <- rep(low, each = 2)
    bounds[seq(2, length(bounds), by = 2)] <- high
    stopifnot(!is.unsorted(bounds))
    findInterval(x, bounds)%%2 == 1
}
> i <- isInABound(mydata$date, mydata_flag$start_date, mydata_flag$end_date)
> mydata[!i,]
                 date species
1 2016-01-31 23:59:53 -559.17
2 2016-02-01 00:00:53 -556.68
5 2016-02-01 00:03:53 -557.36



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, Feb 29, 2016 at 7:20 AM, Thomas Barningham <stbarningham at gmail.com>
wrote:

> Dear R users,
>
> I have two data frames.
>
> The first contains a date/time column and the concentration of a species:
>
> head(mydata)
>                          date                species
> 1      2016-01-31 23:59:53      -559.17
> 2      2016-02-01 00:00:53      -556.68
> 3      2016-02-01 00:01:53      -554.89
> 4      2016-02-01 00:02:53      -556.72
> 5      2016-02-01 00:03:53      -557.36
> 6      2016-02-01 00:13:53      -561.42
>
>
> The second contains a list of start and end date pairs:
>
> head(mydata_flag)
>         start_date                      end_date
> 1     2016-02-01 00:01:00       2016-02-01 00:03:00
> 2     2016-02-01 00:10:00       2016-02-01 00:15:00
>
> I need to loop through all pairs of dates in the mydata_flag data
> frame and then remove any data in the mydata data frame that is
> between each of the date pairs.
>
> The result for what I've presented here would look something like this:
>                                   date       species
> 1       2016-01-31 23:59:53     -559.17
> 2       2016-02-01 00:00:53     -556.68
> 3       2016-02-01 00:03:53     -557.36
>
> I've searched high and low for answer to this. I know it's a
> subsetting problem but I don't know how to approach it. Subset answers
> tend to have one start end date pair and keep the data between the
> dates. I need to remove data between the dates and I have a full data
> frame of date/time pairs to consider. For background info: this is to
> flag bad atmospheric data between times that there were known
> instrumentation issues.
>
> Thanks in advance,
>
> Thomas
>
> --
> Thomas Barningham
> Centre for Ocean and Atmospheric Sciences
> School of Environmental Sciences
> University of East Anglia
> Norwich Research Park
> Norwich
> NR4 7TJ
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list