[R] Keep only first date from consecutive dates

William Dunlap wdunlap at tibco.com
Fri Dec 4 22:10:14 CET 2015


With a data.frame sorted by id, with ties broken by date, as in
your example, you can select rows that are either the start
of a new id group or the start of run of consecutive dates with:

> w <- c(TRUE, diff(uci$date)>1) | c(TRUE, diff(uci$id)!=0)
> which(w)
[1] 1 4 5 7
> uci[w,]
  id       date value
1  1 2005-10-28     1
4  1 2005-11-07     3
5  1 2007-03-19     1
7  2 2004-06-02     2

I'll leave it to you to translate that R syntax into data.table syntax -
it just involves comparing the current row with the previous row.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, Dec 4, 2015 at 12:53 PM, Frank S. <f_j_rod at hotmail.com> wrote:
> Dear R users,
>
> I usually work with data.table package, but I'm sure that muy question can also be answered working with R data frame.
> Working with grouped data (by "id"),  I wonder if it is possible to keep in a R data.frame (or R data.table):
> a) Only the first row if there is a row which belongs to a a group of rows (from same "id") that have consecutive dates.
> b) All the rows which do not belong to the above groups.
>
> As an example, I have "uci" data.frame:
>
> uci <- data.table(id=c(rep(1,6),2),
>                 date = as.Date(c("2005-10-28","2005-10-29","2005-10-30","2005-11-07","2007-03-19","2007-03-20","2004-06-02")),
>                 value = c(1, 2, 1, 3, 1, 2, 2))
>
>    id              date   value
>     1  2005-10-28        1
>     1  2005-10-29        2
>     1  2005-10-30        1
>     1  2005-11-07        3
>     1  2007-03-19        1
>     1  2007-03-20        2
>     2  2004-06-02        2
>
> And the desired output would be:
>
>    id              date   value
>     1  2005-10-28        1
>     1  2005-11-07        3
>     1  2007-03-19        1
>     2  2004-06-02        2
>
> # From the following link, I have tried:
> http://stackoverflow.com/questions/32308636/r-how-to-sum-values-from-rows-only-if-the-key-value-is-the-same-and-also-if-the
>
> setDT(uci)[ ,list(date=date[1L], value = value[1L]),  by = .(ind=rleid(date), id)][, ind:=NULL][]
>
> But I get the same data frame, and I do not know the reason.
>
> Thank you very much for any help!!
>
> Frank S.
>
>
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list