[R] complicated time series filtering issue

Kimmo Elo k|mmo@e|o @end|ng |rom utu@||
Tue Apr 5 15:06:11 CEST 2022


Hi!

Here is an alternative solution using DATENUMBER:

i<-2
while (i<nrow(bc.df)) {
    if (bc.df$DATENUMBER[i]-bc.df$DATENUMBER[i-1]<=11) {
        bc.df<-bc.df[-i,]
    } else { 
        bc.df$interval[i]<-bc.df$DATENUMBER[i]-bc.df$DATENUMBER[i-1]
        i<-i+1 
    }
}

Best,
Kimmo

ti, 2022-04-05 kello 15:36 +0300, Eric Berger kirjoitti:
> For a different approach, use the Date column rather than the
> differences
> column.
> I assume the data has been put into the bc.df data frame (as Jim
> does,
> above)
> 
> f <- function(v,m=10L) {
>   w <- 1L
>   while( (i <- tail(w,1)) < length(v))
>     w <- c(w, match(TRUE,v[i:(i+m+1)] > v[i]+m )+(i-1))
>   w
> }
> f(as.integer(as.Date(strptime(bc.df$Date,"%d-%b-%y"))))
> 
> 
> 
> On Tue, Apr 5, 2022 at 1:16 AM Jim Lemon <drjimlemon using gmail.com>
> wrote:
> 
> > Hi Brian,
> > Perhaps this:
> > 
> > bc.df<-read.table(text="Date   INDIVIDUAL DATENUMBER LENGTH
> > length.prev
> > interval
> > 12-May-04 57084544        133         682.4           NA       NA
> > 28-Sep-04 57084544        272         724.8        682.4      139
> > 30-Sep-04 57084544        274         740.8        724.8        2
> > 7-Oct-04 57084544        281         745.4        740.8        7
> > 22-Nov-04 57084544        327         780.2        745.4       46
> > 27-Jan-05 57084544        393         817.2        780.2       66
> > 8-Mar-05 57084544        433         834.1        817.2       40
> > 2-Jul-05 57084544        549         876.3        834.1      116
> > 6-Jul-05 57084544        553         871.5        876.3        4
> > 4-Aug-05 57084544        582         887.5        871.5       29
> > 28-Dec-05 57084544        728         921.8        887.5      146
> > 31-Jan-06 57084544        762         936.8        921.8       34
> > 27-Feb-06 57084544        789         962.4        936.8       27
> > 21-Nov-06 57084544       1056         972.3        962.4      267
> > 30-Mar-07 57084544       1185        1007.2        972.3      129
> > 23-Apr-07 57084544       1209        1009.1       1007.2       24
> > 22-May-07 57084544       1238         991.6       1009.1       29
> > 23-May-07 57084544       1239        1015.9        991.6        1
> > 16-Jul-07 57084544       1293        1006.5       1015.9       54
> > 9-Aug-07 57084544       1317        1013.0       1006.5       24
> > 27-Aug-07 57084544       1335        1013.0       1013.0       18
> > 29-Jul-08 57084544       1672        1021.5       1013.0      337
> > 30-Jul-08 57084544       1673         984.3       1021.5        1
> > 31-Jul-08 57084544       1674        1008.5        984.3        1
> > 10-Aug-08 57084544       1684        1002.8       1008.5       10
> > 22-Oct-08 57084544       1757         977.6       1002.8       73
> > 2-Dec-08 57084544       1798        1000.6        977.6       41",
> > stringsAsFactors=FALSE,header=TRUE)
> > min_interval<-function(x,minint=10) {
> >  indx<-1
> >  cumint<-0
> >  for(i in 2:length(x)) {
> >   cumint<-cumint+x[i]
> >   if(cumint > minint) {
> >    indx<-c(indx,i)
> >    cumint<-0
> >   }
> >  }
> >  return(indx)
> > }
> > min_interval(bc.df$interval)
> > 
> > Jim
> > 
> > On Tue, Apr 5, 2022 at 7:31 AM Ebert,Timothy Aaron <tebert using ufl.edu>
> > wrote:
> > > I think the idea is more
> > > for (i in 2:nrow(x)){
> > > ifelse(x[i]-x[i-1] >10) {keep x[i], delete x[i]]
> > > }
> > > 
> > > I am not quite clear on the correct code for "keep" or "delete."
> > > 
> > > One could try
> > > for (i in 2:nrow(x)){
> > > x$new[i] <- x[i]-x[i-1]
> > > }
> > > x <- x %>% filter(new>=10)
> > > 
> > > This only works if consecutive sample dates are 10 or more days
> > > apart.
> > You could add an else if that would accumulate days, and if
> > successful
> > reset the clock.
> > > Tim
> > > -----Original Message-----
> > > From: R-help <r-help-bounces using r-project.org> On Behalf Of Bert
> > > Gunter
> > > Sent: Monday, April 4, 2022 5:04 PM
> > > To: Cade, Brian S <cadeb using usgs.gov>
> > > Cc: r-help using r-project.org
> > > Subject: Re: [R] complicated time series filtering issue
> > > 
> > > [External Email]
> > > 
> > > Like this?
> > > 
> > > winnow <- function(x, int=5){
> > >    keep <- x[1]
> > >    remaining <- x[-1]
> > >    while (length(remaining))
> > >    {
> > >       nxt <- tail(keep,1) + int
> > >       if(length(remaining) ==1 ||
> > >          all(remaining < nxt))break
> > >       remaining <- remaining[remaining >tail(keep,1) + int]
> > >       keep <- c(keep,remaining[1])
> > >    }
> > >    keep
> > > }
> > > 
> > > > x
> > >  [1]  1  2  5  7  8  9 15 16 17 19 20 21 28 35 37 41 43 45 46 50
> > > > winnow(x,7)
> > > [1]  1  9 17 28 37 45
> > > > winnow(x,5)
> > > [1]  1  7 15 21 28 35 41 50
> > > 
> > > Cheers,
> > > Bert
> > > 
> > > "The trouble with having an open mind is that people keep coming
> > > along
> > and sticking things into it."
> > > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip
> > > )
> > > 
> > > On Mon, Apr 4, 2022 at 12:56 PM Cade, Brian S via R-help <
> > r-help using r-project.org> wrote:
> > > > Hello:  I have an issue with filtering in a time series of
> > > > animal
> > > > growth data that seems conceptually simple but I have not come
> > > > up with
> > > > effective code to implement this.  I have temporal sequences of
> > > > lengths by individuals and I want to retain only those data
> > > > that are
> > > > > 10 days apart sequentially within an individuals records.  I
> > > > > can
> > > > readily compute intervals between successive dates by
> > > > individual using
> > > > data.table() and its by = INDIVIDUAL functionality.  See
> > > > example data
> > > > for one individual below.  But what currently eludes me in
> > > > processing
> > > > this is how to recognize for example that deleting the 2nd and
> > > > 3rd
> > > > rows is required because the totality of their time interval is
> > > > 9
> > > > days, deleting 8th record with 4 days is required,  deleting
> > > > 17th
> > > > record with 1 day is required, deleting 22nd and 23rd records
> > > > is
> > > > required because their sum is 2 days, but we do not delete 24th
> > > > record
> > > > of 10 days because the sum of previous 2 records deleted and
> > > > this one
> > > > is now 12 days.  Each individual can have ve
> > >  ry
> > > >   different patterns of these sort of sequences.  These
> > > > sequences are
> > easy to look at and determine what needs to be done but writing
> > effective
> > code to accomplish this filtering seems to require some
> > functionality that
> > I am currently missing.
> > > > Any suggestions would be greatly appreciated.
> > > > 
> > > >          Date   INDIVIDUAL DATENUMBER LENGTH length.prev
> > > > interval
> > > > 228 12-May-04
> > > > 57084544        133         682.4           NA       NA
> > > > 229 28-Sep-04
> > > > 57084544        272         724.8        682.4      139
> > > > 230 30-Sep-04
> > > > 57084544        274         740.8        724.8        2
> > > > 231  7-Oct-04
> > > > 57084544        281         745.4        740.8        7
> > > > 232 22-Nov-04
> > > > 57084544        327         780.2        745.4       46
> > > > 233 27-Jan-05
> > > > 57084544        393         817.2        780.2       66
> > > > 234  8-Mar-05
> > > > 57084544        433         834.1        817.2       40
> > > > 235  2-Jul-05
> > > > 57084544        549         876.3        834.1      116
> > > > 236  6-Jul-05
> > > > 57084544        553         871.5        876.3        4
> > > > 237  4-Aug-05
> > > > 57084544        582         887.5        871.5       29
> > > > 238 28-Dec-05
> > > > 57084544        728         921.8        887.5      146
> > > > 239 31-Jan-06
> > > > 57084544        762         936.8        921.8       34
> > > > 240 27-Feb-06
> > > > 57084544        789         962.4        936.8       27
> > > > 241 21-Nov-06
> > > > 57084544       1056         972.3        962.4      267
> > > > 242 30-Mar-07
> > > > 57084544       1185        1007.2        972.3      129
> > > > 243 23-Apr-07
> > > > 57084544       1209        1009.1       1007.2       24
> > > > 244 22-May-07
> > > > 57084544       1238         991.6       1009.1       29
> > > > 245 23-May-07
> > > > 57084544       1239        1015.9        991.6        1
> > > > 246 16-Jul-07
> > > > 57084544       1293        1006.5       1015.9       54
> > > > 247  9-Aug-07
> > > > 57084544       1317        1013.0       1006.5       24
> > > > 248 27-Aug-07
> > > > 57084544       1335        1013.0       1013.0       18
> > > > 249 29-Jul-08
> > > > 57084544       1672        1021.5       1013.0      337
> > > > 250 30-Jul-08
> > > > 57084544       1673         984.3       1021.5        1
> > > > 251 31-Jul-08
> > > > 57084544       1674        1008.5        984.3        1
> > > > 252 10-Aug-08
> > > > 57084544       1684        1002.8       1008.5       10
> > > > 253 22-Oct-08
> > > > 57084544       1757         977.6       1002.8       73
> > > > 254  2-Dec-08
> > > > 57084544       1798        1000.6        977.6       41
> > > > 
> > > > 
> > > > 
> > > > Brian
> > > > 
> > > > 
> > > > 
> > > > Brian S. Cade, PhD
> > > > 
> > > > U. S. Geological Survey
> > > > Fort Collins Science Center
> > > > 2150 Centre Ave., Bldg. C
> > > > Fort Collins, CO  80526-8818
> > > > 
> > > > email:  cadeb using usgs.gov<mailto:brian_cade using usgs.gov>
> > > > tel:  970 226-9326
> > > > 
> > > > 
> > > >         [[alternative HTML version deleted]]
> > > > 
> > > > ______________________________________________
> > > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more,
> > > > see
> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mail
> > > > man_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-
> > > > zLMB3EPkvcnVg&r=9PEhQh2kVeAs
> > > > Rzsn7AkP-
> > > > g&m=ZfVdnGSALzyajo_d1U09NJs3RCXcx5NwQ2PZ9A9zwEnVYnexn4toTyxgu
> > > > -vCEJab&s=PG1chCZY6eQzSdtSlvChVVVt0HXVDG1bgBkJMQ8wk1A&e=
> > > > PLEASE do read the posting guide
> > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.or
> > > > g_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-
> > > > zLMB3EPkvcnVg&r=9PEhQh2kVeA
> > > > sRzsn7AkP-
> > > > g&m=ZfVdnGSALzyajo_d1U09NJs3RCXcx5NwQ2PZ9A9zwEnVYnexn4toTyxg
> > > > u-vCEJab&s=D_bzOVjWanUgYD_zJq-IS8EObMKBmC5Q5D-a_IHxMAA&e=
> > > > and provide commented, minimal, self-contained, reproducible
> > > > code.
> > > 
> > > ______________________________________________
> > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=ZfVdnGSALzyajo_d1U09NJs3RCXcx5NwQ2PZ9A9zwEnVYnexn4toTyxgu-vCEJab&s=PG1chCZY6eQzSdtSlvChVVVt0HXVDG1bgBkJMQ8wk1A&e=
> > > PLEASE do read the posting guide
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=ZfVdnGSALzyajo_d1U09NJs3RCXcx5NwQ2PZ9A9zwEnVYnexn4toTyxgu-vCEJab&s=D_bzOVjWanUgYD_zJq-IS8EObMKBmC5Q5D-a_IHxMAA&e=
> > > and provide commented, minimal, self-contained, reproducible
> > > code.
> > > 
> > > ______________________________________________
> > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible
> > > code.
> > 
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> > 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


More information about the R-help mailing list