[R] move date-values from one line to several lines

Marc Schwartz marc_schwartz at me.com
Tue Dec 2 17:14:42 CET 2014


> On Dec 2, 2014, at 9:29 AM, Matthias Weber <Matthias.Weber at fntsoftware.com> wrote:
> 
> Hello together,
> 
> i have a data.frame with date-values. What I want is a data.frame with a several lines for each date.
> 
> My current data.frame looks like this one:
> 
> ID     FROM         TO                REASON
> 1      2015-02-27   2015-02-28    Holiday
> 1      2015-03-15   2015-03-20    Illness
> 2      2015-05-20   2015-02-23    Holiday
> 2      2015-06-01   2015-06-03    Holiday
> 2      2015-07-01   2015-07-01    Illness
> 
> The result looks like this one:
> 
> ID   DATE           REASON
> 1    2015-02-27    Holiday
> 1    2015-02-28    Holiday
> 1    2015-03-15    Illness
> 1    2015-03-16    Illness
> 1    2015-03-17    Illness
> 1    2015-03-18    Illness
> 1    2015-03-19    Illness
> 1    2015-03-20    Illness
> 2    2015-05-20   Holiday
> 2    2015-05-21   Holiday
> 2    2015-05-22   Holiday
> 2    2015-05-23   Holiday
> 2    2015-06-01   Holiday
> 2    2015-06-02   Holiday
> 2    2015-06-02   Holiday
> 2    2015-07-01   Illness
> 
> Maybe anyone can help me, how I can do this.
> 
> Thank you.
> 
> Best regards.
> 
> Mat


A quick and dirty approach.

First, note that in your source data frame, the TO value in the third row is incorrect. I changed it here:

> DF
  ID       FROM         TO  REASON
1  1 2015-02-27 2015-02-28 Holiday
2  1 2015-03-15 2015-03-20 Illness
3  2 2015-05-20 2015-05-23 Holiday
4  2 2015-06-01 2015-06-03 Holiday
5  2 2015-07-01 2015-07-01 Illness

With that in place, you can use R's recycling of values to create multiple data frame rows from the date sequences and the single ID and REASON entries:

i <- 1

> data.frame(ID = DF$ID[i], DATE = seq(DF$FROM[i], DF$TO[i], by = "day"), REASON = DF$REASON[i])
  ID       DATE  REASON
1  1 2015-02-27 Holiday
2  1 2015-02-28 Holiday


So just put that into an lapply() based loop, which returns a list:

> DF.TMP <- lapply(seq(nrow(DF)), 
                   function(i) data.frame(ID = DF$ID[i], 
                                          DATE = seq(DF$FROM[i], DF$TO[i], by = "day"), 
                                          REASON = DF$REASON[i]))

> DF.TMP
[[1]]
  ID       DATE  REASON
1  1 2015-02-27 Holiday
2  1 2015-02-28 Holiday

[[2]]
  ID       DATE  REASON
1  1 2015-03-15 Illness
2  1 2015-03-16 Illness
3  1 2015-03-17 Illness
4  1 2015-03-18 Illness
5  1 2015-03-19 Illness
6  1 2015-03-20 Illness

[[3]]
  ID       DATE  REASON
1  2 2015-05-20 Holiday
2  2 2015-05-21 Holiday
3  2 2015-05-22 Holiday
4  2 2015-05-23 Holiday

[[4]]
  ID       DATE  REASON
1  2 2015-06-01 Holiday
2  2 2015-06-02 Holiday
3  2 2015-06-03 Holiday

[[5]]
  ID       DATE  REASON
1  2 2015-07-01 Illness


Then use do.call() on the result:

> do.call(rbind, DF.TMP)
   ID       DATE  REASON
1   1 2015-02-27 Holiday
2   1 2015-02-28 Holiday
3   1 2015-03-15 Illness
4   1 2015-03-16 Illness
5   1 2015-03-17 Illness
6   1 2015-03-18 Illness
7   1 2015-03-19 Illness
8   1 2015-03-20 Illness
9   2 2015-05-20 Holiday
10  2 2015-05-21 Holiday
11  2 2015-05-22 Holiday
12  2 2015-05-23 Holiday
13  2 2015-06-01 Holiday
14  2 2015-06-02 Holiday
15  2 2015-06-03 Holiday
16  2 2015-07-01 Illness


See ?seq.Date for the critical step.

Regards,

Marc Schwartz



More information about the R-help mailing list