[R] Keep only first date from consecutive dates

Frank S. f_j_rod at hotmail.com
Wed Dec 9 10:38:19 CET 2015


Many thanks to: William Dunlap, Dennis Murphy and David Winsemius for your quick and efficient answers!!
 
Best regards,
 
Frank S.
 
 
> Subject: Re: [R] Keep only first date from consecutive dates
> From: dwinsemius en comcast.net
> Date: Fri, 4 Dec 2015 16:34:38 -0800
> CC: f_j_rod en hotmail.com; r-help en r-project.org
> To: wdunlap en tibco.com
> 
> 
> > On Dec 4, 2015, at 1:10 PM, William Dunlap <wdunlap en tibco.com> wrote:
> > 
> > With a data.frame sorted by id, with ties broken by date, as in
> > your example, you can select rows that are either the start
> > of a new id group or the start of run of consecutive dates with:
> > 
> >> w <- c(TRUE, diff(uci$date)>1) | c(TRUE, diff(uci$id)!=0)
> >> which(w)
> > [1] 1 4 5 7
> >> uci[w,]
> >  id       date value
> > 1  1 2005-10-28     1
> > 4  1 2005-11-07     3
> > 5  1 2007-03-19     1
> > 7  2 2004-06-02     2
> > 
> > I'll leave it to you to translate that R syntax into data.table syntax -
> > it just involves comparing the current row with the previous row.
> > 
> > Bill Dunlap
> > TIBCO Software
> > wdunlap tibco.com
> > 
> > 
> > On Fri, Dec 4, 2015 at 12:53 PM, Frank S. <f_j_rod en hotmail.com> wrote:
> >> Dear R users,
> >> 
> >> I usually work with data.table package, but I'm sure that muy question can also be answered working with R data frame.
> >> Working with grouped data (by "id"),  I wonder if it is possible to keep in a R data.frame (or R data.table):
> >> a) Only the first row if there is a row which belongs to a a group of rows (from same "id") that have consecutive dates.
> >> b) All the rows which do not belong to the above groups.
> >> 
> >> As an example, I have "uci" data.frame:
> >> 
> >> uci <- data.table(id=c(rep(1,6),2),
> >>                date = as.Date(c("2005-10-28","2005-10-29","2005-10-30","2005-11-07","2007-03-19","2007-03-20","2004-06-02")),
> >>                value = c(1, 2, 1, 3, 1, 2, 2))
> >> 
> >>   id              date   value
> >>    1  2005-10-28        1
> >>    1  2005-10-29        2
> >>    1  2005-10-30        1
> >>    1  2005-11-07        3
> >>    1  2007-03-19        1
> >>    1  2007-03-20        2
> >>    2  2004-06-02        2
> >> 
> >> And the desired output would be:
> >> 
> >>   id              date   value
> >>    1  2005-10-28        1
> >>    1  2005-11-07        3
> >>    1  2007-03-19        1
> >>    2  2004-06-02        2
> 
> The syntax of `[.data.table` is a bit odd; You can refer to columns by name; I never trust my intuition, though.
> 
> Selection is usually done with a logical vector in the ‘i’-position. The diff operator does succeed in the ‘i’ position with the obvious need to prepend with a starting value..
> 
> > uci[ c(0,diff(date))!=1, ]
>    id       date value
> 1:  1 2005-10-28     1
> 2:  1 2005-11-07     3
> 3:  1 2007-03-19     1
> 4:  2 2004-06-02     2
> 
> The other cases are handle with the converse-expression
> 
> > uci[c(0,diff(date)) == 1, ]
>    id       date value
> 1:  1 2005-10-29     2
> 2:  1 2005-10-30     1
> 3:  1 2007-03-20     2
> 
> 
> >> 
> >> # From the following link, I have tried:
> >> http://stackoverflow.com/questions/32308636/r-how-to-sum-values-from-rows-only-if-the-key-value-is-the-same-and-also-if-the
> >> 
> >> setDT(uci)[ ,list(date=date[1L], value = value[1L]),  by = .(ind=rleid(date), id)][, ind:=NULL][]
> >> 
> >> But I get the same data frame, and I do not know the reason.
> >> 
> >> Thank you very much for any help!!
> >> 
> >> Frank S.
> >> 
> >> 
> >> 
> >> 
> >> 
> >>        [[alternative HTML version deleted]]
> >> 
> >> ______________________________________________
> >> R-help en r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> > 
> > ______________________________________________
> > R-help en r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius
> Alameda, CA, USA
> 
 		 	   		  
	[[alternative HTML version deleted]]



More information about the R-help mailing list