[R] Sub setting multiple ids based on a 2nd data frame

arun smartpink111 at yahoo.com
Sun Sep 8 16:31:28 CEST 2013


Hi,

The ?as.numeric() in 'indx' is not needed.

 indx1<-(as.Date(AB$Start)<= as.Date(AB$Date)) & (as.Date(AB$Date) <= as.Date(AB$End))
 identical(indx,indx1)
#[1] TRUE
 AB[indx1,-c(5:7)]


A.K.

----- Original Message -----
From: arun <smartpink111 at yahoo.com>
To: R help <r-help at r-project.org>
Cc: Matthew Guzzo <mattguzzo12 at gmail.com>
Sent: Sunday, September 8, 2013 1:37 AM
Subject: Re: Sub setting multiple ids based on a 2nd data frame

HI Matt,

I changed the dates a little bit to show dates that are outside the range in dataset B.

A<- read.table(text="
ID      Date             Depth  Temp
1       2002-05-12           10 12
1       2003-05-13           10 12
1       2003-05-14           10 12
1       2004-04-15           10 12
2       2002-05-16           10 12
2       2002-12-17           10 12
2       2003-04-18           10 12
2       2002-05-19           10 12
3       2003-05-10           10 12
3       2004-05-21           10 12
3       2004-05-22           10 12
3       2005-05-10           10 12
3       2006-05-24           10 12
",sep="",header=TRUE,stringsAsFactors=FALSE)
 
B<- read.table(text="
Year   Start    End
2002 2002-05-10 2002-11-01
2003 2003-05-11 2003-11-02
2004 2004-05-12 2004-11-03
2005 2005-05-13 2005-11-04
2006 2006-05-14 2006-11-05
",sep="",header=TRUE,stringsAsFactors=FALSE) 

 A$Year<-gsub("-.*","",A$Date)
 library(plyr)
AB<-join(A,B,by="Year")
 indx<-(as.numeric(as.Date(AB$Start))<= as.numeric(as.Date(AB$Date))) & (as.numeric(as.Date(AB$Date)) <= as.numeric(as.Date(AB$End)))

 res<- AB[indx,-c(6,7)]
 res
#   ID       Date Depth Temp Year
#1   1 2002-05-12    10   12 2002
#2   1 2003-05-13    10   12 2003
#3   1 2003-05-14    10   12 2003
#5   2 2002-05-16    10   12 2002
#8   2 2002-05-19    10   12 2002
#10  3 2004-05-21    10   12 2004
#11  3 2004-05-22    10   12 2004
#13  3 2006-05-24    10   12 2006


A.K.


Hi All, 

I accidentally posted this in the data.table forum and deleted it to post here. 

I have some telemetry data that spans multiple years (2002 - 2013) with 
multiple individuals per year. I want to subset the telemetry data to 
include only those data points that fall between specific dates which are 
provided in a 2nd data frame. The telemetry df is in the form of: 

DF "A" 

ID      Date             Depth  Temp 
1       2002-05-12           10 12 
1       2002-05-13           10 12 
1       2002-05-14           10 12 
1       2002-05-15           10 12 
2       2002-05-16           10 12 
2       2002-05-17           10 12 
2       2002-05-18           10 12 
2       2002-05-19           10 12 
3       2002-05-20           10 12 
3       2002-05-21           10 12 
3       2002-05-22           10 12 
3       2002-05-23           10 12 
3       2002-05-24           10 12 

And the df with the dates I want to use to subset is formatted as follows: 

 DF "B" 

Year       Start            End 
2002    2002-05-10      2002-11-01 
2003    2003-05-11      2003-11-02 
2004    2004-05-12      2004-11-03 
2005    2005-05-13      2005-11-04 
2006    2006-05-14      2006-11-05 

So, I want to say, for each ID in DF A, subset and keep only those data 
points collected on a date that fall between the start and end date for the 
corresponding year from DF B. 

I am unsure if a loop is my best bet, or using plyr (which I am unfamiliar 
with). I am relatively new to R, so this seems a bit above my head. Any help 
is much appreciated. 

Thanks in advance!



More information about the R-help mailing list