[R] move date-values from one line to several lines

Tue Dec 2 19:51:55 CET 2014

On Tue, 2 Dec 2014, Marc Schwartz wrote:

>
>> On Dec 2, 2014, at 9:29 AM, Matthias Weber <Matthias.Weber at fntsoftware.com> wrote:
>>
>> Hello together,
>>
>> i have a data.frame with date-values. What I want is a data.frame with a several lines for each date.
>>
>> My current data.frame looks like this one:
>>
>> ID     FROM         TO                REASON
>> 1      2015-02-27   2015-02-28    Holiday
>> 1      2015-03-15   2015-03-20    Illness
>> 2      2015-05-20   2015-02-23    Holiday
>> 2      2015-06-01   2015-06-03    Holiday
>> 2      2015-07-01   2015-07-01    Illness
>>
>> The result looks like this one:
>>
>> ID   DATE           REASON
>> 1    2015-02-27    Holiday
>> 1    2015-02-28    Holiday
>> 1    2015-03-15    Illness
>> 1    2015-03-16    Illness
>> 1    2015-03-17    Illness
>> 1    2015-03-18    Illness
>> 1    2015-03-19    Illness
>> 1    2015-03-20    Illness
>> 2    2015-05-20   Holiday
>> 2    2015-05-21   Holiday
>> 2    2015-05-22   Holiday
>> 2    2015-05-23   Holiday
>> 2    2015-06-01   Holiday
>> 2    2015-06-02   Holiday
>> 2    2015-06-02   Holiday
>> 2    2015-07-01   Illness
>>
>> Maybe anyone can help me, how I can do this.
>>
>> Thank you.
>>
>> Best regards.
>>
>> Mat
>
>
> A quick and dirty approach.
>
> First, note that in your source data frame, the TO value in the third row is incorrect. I changed it here:
>
>> DF
>  ID       FROM         TO  REASON
> 1  1 2015-02-27 2015-02-28 Holiday
> 2  1 2015-03-15 2015-03-20 Illness
> 3  2 2015-05-20 2015-05-23 Holiday
> 4  2 2015-06-01 2015-06-03 Holiday
> 5  2 2015-07-01 2015-07-01 Illness
>
> With that in place, you can use R's recycling of values to create multiple data frame rows from the date sequences and the single ID and REASON entries:
>
> i <- 1
>
>> data.frame(ID = DF$ID[i], DATE = seq(DF$FROM[i], DF$TO[i], by = "day"), REASON = DF$REASON[i])
>  ID       DATE  REASON
> 1  1 2015-02-27 Holiday
> 2  1 2015-02-28 Holiday
>
>
> So just put that into an lapply() based loop, which returns a list:
>
>> DF.TMP <- lapply(seq(nrow(DF)),
>                   function(i) data.frame(ID = DF$ID[i],
>                                          DATE = seq(DF$FROM[i], DF$TO[i], by = "day"),
>                                          REASON = DF$REASON[i]))
>
>> DF.TMP
> [[1]]
>  ID       DATE  REASON
> 1  1 2015-02-27 Holiday
> 2  1 2015-02-28 Holiday
>
> [[2]]
>  ID       DATE  REASON
> 1  1 2015-03-15 Illness
> 2  1 2015-03-16 Illness
> 3  1 2015-03-17 Illness
> 4  1 2015-03-18 Illness
> 5  1 2015-03-19 Illness
> 6  1 2015-03-20 Illness
>
> [[3]]
>  ID       DATE  REASON
> 1  2 2015-05-20 Holiday
> 2  2 2015-05-21 Holiday
> 3  2 2015-05-22 Holiday
> 4  2 2015-05-23 Holiday
>
> [[4]]
>  ID       DATE  REASON
> 1  2 2015-06-01 Holiday
> 2  2 2015-06-02 Holiday
> 3  2 2015-06-03 Holiday
>
> [[5]]
>  ID       DATE  REASON
> 1  2 2015-07-01 Illness
>
>
> Then use do.call() on the result:
>
>> do.call(rbind, DF.TMP)
>   ID       DATE  REASON
> 1   1 2015-02-27 Holiday
> 2   1 2015-02-28 Holiday
> 3   1 2015-03-15 Illness
> 4   1 2015-03-16 Illness
> 5   1 2015-03-17 Illness
> 6   1 2015-03-18 Illness
> 7   1 2015-03-19 Illness
> 8   1 2015-03-20 Illness
> 9   2 2015-05-20 Holiday
> 10  2 2015-05-21 Holiday
> 11  2 2015-05-22 Holiday
> 12  2 2015-05-23 Holiday
> 13  2 2015-06-01 Holiday
> 14  2 2015-06-02 Holiday
> 15  2 2015-06-03 Holiday
> 16  2 2015-07-01 Illness
>
>
> See ?seq.Date for the critical step.
>
> Regards,
>
> Marc Schwartz

Same thing, with some optional syntactic sugar:

library(dplyr)
dta <- read.table( text=
"ID     FROM         TO                REASON
1      2015-02-27   2015-02-28    Holiday
1      2015-03-15   2015-03-20    Illness
2      2015-05-20   2015-05-23    Holiday
2      2015-06-01   2015-06-03    Holiday
2      2015-07-01   2015-07-01    Illness
", header=TRUE, as.is=TRUE )

# Wrap function sequence in parentheses so pipes can be at beginning
# of line
(     dta
       # data not provided using dput, so date columns are character
   %>% mutate( FROM = as.Date(FROM)
             , TO = as.Date(TO)
             )
       # process data frame one row at a time
   %>% rowwise
       # form a new data frame using each row, results automatically
       # rbind()ed
   %>% do( data.frame( ID=.$ID
                     , DATE=seq.Date( .$FROM, .$TO, by="day" )
                     , REASON=.$REASON
                     , stringsAsFactors=FALSE
                     )
         )
       # optionally drop "data frame features" provided by dplyr to get
       # comparable result as above
   %>% as.data.frame
)

Read the dplyr and magrittr package help files to learn more about this 
method of handling data. I think Marc's solution is worth understanding 
because that is really what dplyr is doing for you, but it can get tedious 
to do that whole process yourself day-in and day-out.

Dplyr can also be used in conjunction with data.tables package or SQL, 
which can be good if you have a lot of data to work with... again, just 
syntactic sugar, but convenient.

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k