[R] sub-setting rows based on dates in R

Jim Lemon drjimlemon at gmail.com
Thu Feb 2 09:07:11 CET 2017


Hi Md,
What I have done is to use the most recent intervening date between the
last set of dates if any are there, otherwise the last set of dates. That
is what I understand from your description.

Remember that this is a very clunky way to do something like this by adding
rows to a data frame, and it is likely to scale up to large data sets badly.

df1<-read.table(text="Date    Rainfall_Duration
 6/14/2016       10
 6/15/2016       20
 6/17/2016       10
 8/16/2016       30
 8/19/2016       40
 8/21/2016       20
 9/4/2016        10",
 header=TRUE,stringsAsFactors=FALSE)
# change the character strings in df2$Date to Date values
df1$Date<-as.Date(df1$Date,"%m/%d/%Y")

df2<-read.table(text="Date    Removal.Rate
 6/17/2016    64.7
 6/30/2016    22.63
 7/14/2016    18.18
 8/19/2016    27.87
 8/30/2016    23.45
 9/2/2016     17.2",
 header=TRUE,stringsAsFactors=FALSE)
# change the character strings in df2$Date to Date values
df2$Date<-as.Date(df2$Date,"%m/%d/%Y")

df3<-data.frame(Rate.Removal.Date=NULL,Date=NULL,Rainfall_Duration=NULL)

df3row<-0

for(i in 1:dim(df2)[1]) {
 rdrows<-which(df2$Date[i] >= df1$Date & !(df2$Date[i] > df1$Date + 8))
 # if there are no dates in df1 within the prior 7 days
 if(!length(rdrows)) {
  # first check if at least one date in df1 is less than the df2
  # date and is not included in the last set of df1 dates
  checkrows<-which(df2$Date[i] >= df1$Date)
  # use the last date greater than the maximum in lastrows
  if(any(checkrows > lastrows))
   rdrows<-max(checkrows[checkrows > lastrows])
  # otherwise use the last set
  else rdrows<-lastrows
 }
 # save the current set of dates
 lastrows<-rdrows
 # get the number of new rows
 nrows<-length(rdrows)
 for(row in 1:nrows) {
  # set the values in each row
  df3[row+df3row,1]<-format(df2$Date[i],"%m/%d/%Y")
  df3[row+df3row,2]<-format(df1$Date[rdrows[row]],"%m/%d/%Y")
  df3[row+df3row,3]<-df1$Rainfall_Duration[rdrows[row]]
 }
 # keep count of the current number of rows
 df3row<-df3row+nrows
}

names(df3)<-c("Rate.Removal.Date","Date","Rainfall_Duration")
df3

Jim


On Thu, Feb 2, 2017 at 4:58 AM, Md Sami Bin Shokrana <samimist at live.com>
wrote:

> Hi Jim,
>
> Thank you so much for your help. Your code works great. Could you please
> explain your code a bit? One more thing, I am so sorry that I forgot to
> mention one more criteria in my post. If it is not much trouble, could you
> please help me out with that? I have added a couple more observations
> (the bold ones) to each of my data frames which are shown below:
>
>
> The main concept is,
>
> (i) For a specific date in df2, if no matching dates are available in df1
> within the 7 days range, the code will keep on looking for the latest
> available date in df1 with a "Rainfall_Duration" data. For example, in df2,
> for *8/30/2016*, there is no "Rainfall_Duration" data available in
> df1 within the prior 7 days range. So, I want the code to keep on looking
> for dates in df1 until there is an available data for "Rainfall_Duration"
> in df1 (in this case which is * 8/21/2016)* .
>
>
> (ii) Additionally, for* 9/2/2016 *(df2), there is no date available in
> df1 with a "Rainfall_Duration" data within prior 7 days range. The latest
> available data for "Rainfall_Duration" is *8/21/2016*. So, the code will
> extract the same result we had for *8/30/2016* in df2.
>
>  In simpler words, i just want the code to keep on looking for data with
> "Rainfall_Duraiton" in df1 if there is none available within the prior 7
> days range. Sorry for not mentioning it before.
>
>
>
> df1 <-
>
> Date Rainfall_Duration
> 6/14/2016 10
> 6/15/2016 20
> 6/17/2016 10
> 8/16/2016 30
> 8/19/2016 40
> *8/21/2016* *20* *9/4/2016                 10*
>
>
> df2 <-
>
> Date Removal.Rate
> 6/17/2016 64.7
> 6/30/2016 22.63
> 7/14/2016 18.18
> 8/19/2016 27.87
> *8/30/2016* *23.45* *9/2/2016         17.2*
>
>
> Expected output:
>
>
> df3 <-
>
> Rate.Removal.Date     Date Rainfall_Duration
> 6/17/2016 6/14/2016 10
> 6/17/2016 6/15/2016 20
> 6/17/2016 6/17/2016 10
> 6/30/2016 6/14/2016 10
> 6/30/2016 6/15/2016 20
> 6/30/2016 6/17/2016 10
> 7/14/2016 6/14/2016 10
> 7/14/2016 6/15/2016 20
> 7/14/2016 6/17/2016 10
> 8/19/2016 8/16/2016 30
> 8/19/2016 8/19/2016 40
> *8/30/2016* *8/21/2016* *20* *9/2/2016                 8/21/2016       20*
>
>
> Thanks in advance.
>
>
> ------------------------------
> *From:* Jim Lemon <drjimlemon at gmail.com>
> *Sent:* Wednesday, February 1, 2017 1:18 PM
> *To:* Md Sami Bin Shokrana; r-help mailing list
> *Subject:* Re: [R] sub-setting rows based on dates in R
>
> Hi Md,
> This kind of clunky, but it might do what you want.
>
> df1<-read.table(text="Date    Rainfall_Duration
>  6/14/2016       10
>  6/15/2016       20
>  6/17/2016       10
>  8/16/2016       30
>  8/19/2016       40",
>  header=TRUE,stringsAsFactors=FALSE)
>
> df1$Date<-as.Date(df1$Date,"%m/%d/%Y")
>
> df2<-read.table(text="Date    Removal.Rate
>  6/17/2016    64.7
>  6/30/2016    22.63
>  7/14/2016    18.18
>  8/19/2016    27.87",
>  header=TRUE,stringsAsFactors=FALSE)
>
> df2$Date<-as.Date(df2$Date,"%m/%d/%Y")
>
> df3<-data.frame(Rate.Removal.Date=NULL,Date=NULL,Rainfall_Duration=NULL)
>
> df3row<-0
>
> for(i in 1:dim(df2)[1]) {
>  rdrows<-which(df2$Date[i] >= df1$Date & !(df2$Date[i] > df1$Date + 8))
>  if(!length(rdrows)) rdrows<-lastrows
>  lastrows<-rdrows
>  nrows<-length(rdrows)
>  for(row in 1:nrows) {
>   df3[row+df3row,1]<-format(df2$Date[i],"%m/%d/%Y")
>   df3[row+df3row,2]<-format(df1$Date[rdrows[row]],"%m/%d/%Y")
>   df3[row+df3row,3]<-df1$Rainfall_Duration[rdrows[row]]
>  }
>  df3row<-df3row+nrows
> }
>
> names(df3)<-c("Rate.Removal.Date","Date","Rainfall_Duration")
> df3
>
> Jim
>
> On Wed, Feb 1, 2017 at 3:48 AM, Md Sami Bin Shokrana <samimist at live.com>
> wrote:
> > Hello guys, I am trying to solve a problem in R. I have 2 data frames
> which look like this:
> > df1 <-
> >   Date    Rainfall_Duration
> > 6/14/2016       10
> > 6/15/2016       20
> > 6/17/2016       10
> > 8/16/2016       30
> > 8/19/2016       40
> >
> > df2 <-
> >   Date    Removal.Rate
> > 6/17/2016    64.7
> > 6/30/2016    22.63
> > 7/14/2016    18.18
> > 8/19/2016    27.87
> >
> > I want to look up the dates from df2 in df1 and their corresponding
> Rainfall_Duration data. For example, I want to look for the 1st date of df2
> in df1 and subset rows in df1 for that specific date and 7 days prior to
> that. additionally, for example: for 6/30/2016 (in df2) there is no dates
> available in df1 within it's 7 days range. So, in this case I just want to
> extract the results same as it's previous date (6/17/2016) in df2. Same
> logic goes for 7/14/2016(df2).
> > The output should look like this:
> >
> > df3<-
> >
> > Rate.Removal.Date      Date             Rainfall_Duration
> > 6/17/2016              6/14/2016              10
> > 6/17/2016              6/15/2016              20
> > 6/17/2016              6/17/2016              10
> > 6/30/2016              6/14/2016              10
> > 6/30/2016              6/15/2016              20
> > 6/30/2016              6/17/2016              10
> > 7/14/2016              6/14/2016              10
> > 7/14/2016              6/15/2016              20
> > 7/14/2016              6/17/2016              10
> > 8/19/2016              8/16/2016              30
> > 8/19/2016              8/19/2016              40
> >
> > I could subset data for the 7 days range. But could not do it when no
> dates are available in that range. I have the following code:
> > library(plyr)
> > library (dplyr)
> > df1$Date <- as.Date(df1$Date,format = "%m/%d/%Y")
> > df2$Date <- as.Date(df2$Date,format = "%m/%d/%Y")
> >
> > df3 <- lapply(df2$Date, function(x){
> >   filter(df1, between(Date, x-7, x))
> > })
> >
> > names(df3) <- as.character(df2$Date)
> > bind_rows(df3, .id = "Rate.Removal.Date")
> > df3 <- ldply (df3, data.frame, .id = "Rate.Removal.Date")
> >
> > I hope I could explain my problem properly. I would highly appreciate if
> someone can help me out with this code or a new one. Thanks in advance.
> >
> >
> >
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> R-help Info Page - Homepage - SfS – Seminar for Statistics
> <https://stat.ethz.ch/mailman/listinfo/r-help>
> stat.ethz.ch
> The main R mailing list, for announcements about the development of R and
> the availability of new code, questions and answers about problems and
> solutions using R ...
>
>
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list