[R] subset rows in two dataframes

Zhuanshi He zhuanshi.he at gmail.com
Sun May 11 18:37:08 CEST 2008


Dear Jim,

The following codes maybe helps.

for (i in 1:length(dat1[,1])) {
    for (j in 1:length(dat2[,1])) {
        if (dat1[i,1] == dat2[j,1] & dat1[i,2] == dat2[j,2]) print (j)
}
}




time1<- as.Date(c("2006-01-03", "2006-05-03", "2006-05-04",
"2006-05-11", "2006-05-12", "2006-05-16", "2006-05-19", "2006-05-26",
"2006-09-15", "2006-10-30", "2006-11-08", "2006-11-14", "2006-11-20"))
volume1<- c(7312.5, 3352.5, 4252.5, 3825.0, 2700.0, 585.0, 810.0,
3015.0, 2925.0, 1102.5, 2632.5, 652.5, 1417.5)

dat1<- data.frame(v1=time1, v2=volume1)

time2<- as.Date(c("2006-05-03", "2006-05-09", "2006-05-04",
"2006-05-08", "2006-07-14", "2006-07-10", "2006-05-12", "2006-05-17",
"2006-05-19", "2006-05-26", "2006-05-29", "2006-05-18", "2006-05-22",
"2006-07-03", "2006-07-05", "2006-07-06", "2006-07-04", "2006-07-24",
"2006-07-12", "2006-07-18"))
volume2<- c(4522.5, 7065.0, 3622.5, 7875.0, 3532.5, 3667.5, 6480.0,
4612.5, 4005.0, 10350.0, 5310.0, 6345.0, 7177.5, 5107.5, 4837.5, 3352.5,
4050.0, 6772.5, 7290.0, 5625.0)

dat2<- data.frame(v1=time2, v2=volume2)

for (i in 1:length(dat1[,1])) {
    for (j in 1:length(dat2[,1])) {
        if (dat1[i,1] == dat2[j,1] & dat1[i,2] == dat2[j,2]) print (j)
}
}


----------------------------------------------------------------------------------------




On 5/11/08, partofy at inoutbox.com <partofy at inoutbox.com> wrote:
> Not exactly. I need something to subset ONLY rows common to both
>  dataframes. In the provided example, dat1 and dat2 have no common rows
>  so I would expect:
>  [1] v1 v2
>  <0 rows> (or 0-length row.names)
>
>  But I can´t do it...
>
>
>
>
>
>  On Sun, 11 May 2008 10:07:25 -0400, "Zhuanshi He"
>  <zhuanshi.he at gmail.com> said:
>  > Dear Jim,
>  >
>  > Maybe u want this,
>  >
>  > > subset(dat2, time1 %in% dat2$v1 & time2 %in% dat2$v1)
>  >            v1     v2
>  > 2  2006-05-09 7065.0
>  > 3  2006-05-04 3622.5
>  > 5  2006-07-14 3532.5
>  > 7  2006-05-12 6480.0
>  > 8  2006-05-17 4612.5
>  > 15 2006-07-05 4837.5
>  > 16 2006-07-06 3352.5
>  > 18 2006-07-24 6772.5
>  > 20 2006-07-18 5625.0
>  > Warning message:
>  > In time1 %in% dat2$v1 & time2 %in% dat2$v1 :
>  >   longer object length is not a multiple of shorter object length
>  >
>  >
>  >
>  > However, it looks the length of time1 and time2 is different.
>  >
>  > --------------------------------------------------------------------------------------------------------------
>  >
>  > On 5/11/08, partofy at inoutbox.com <partofy at inoutbox.com> wrote:
>  > >
>  > >  Dear list:
>  > >
>  > >  I can now reproduce with a bit of my real data, the problem I asked for
>  > >  your help yestarday:
>  > >
>  > >  time1<- as.Date(c("2006-01-03", "2006-05-03", "2006-05-04",
>  > >  "2006-05-11", "2006-05-12", "2006-05-16", "2006-05-19", "2006-05-26",
>  > >  "2006-09-15", "2006-10-30", "2006-11-08", "2006-11-14", "2006-11-20"))
>  > >  volume1<- c(7312.5, 3352.5, 4252.5, 3825.0, 2700.0, 585.0, 810.0,
>  > >  3015.0, 2925.0, 1102.5, 2632.5, 652.5, 1417.5)
>  > >  dat1<- data.frame(v1=time1, v2=volume1)
>  > >
>  > >  time2<- as.Date(c("2006-05-03", "2006-05-09", "2006-05-04",
>  > >  "2006-05-08", "2006-07-14", "2006-07-10", "2006-05-12", "2006-05-17",
>  > >  "2006-05-19", "2006-05-26", "2006-05-29", "2006-05-18", "2006-05-22",
>  > >  "2006-07-03", "2006-07-05", "2006-07-06", "2006-07-04", "2006-07-24",
>  > >  "2006-07-12", "2006-07-18"))
>  > >  volume2<- c(4522.5, 7065.0, 3622.5, 7875.0, 3532.5, 3667.5, 6480.0,
>  > >  4612.5, 4005.0, 10350.0, 5310.0, 6345.0, 7177.5, 5107.5, 4837.5, 3352.5,
>  > >  4050.0, 6772.5, 7290.0, 5625.0)
>  > >  dat2<- data.frame(v1=time2, v2=volume2)
>  > >
>  > >  subset(dat1, v1 %in% dat2$v1 & v2 %in% dat2$v2)
>  > >           v1     v2
>  > >  2 2006-05-03 3352.5
>  > >
>  > >  This is not what I expect since this row is not present in dat2 and I
>  > >  just want records present in both dataframes.
>  > >
>  > >  Help?
>  > >
>  > >  J
>  > >
>  > >
>  > >
>  > >
>  > >
>  > >
>  > >  On Sat, 10 May 2008 18:42:51 -0400, "jim holtman" <jholtman at gmail.com>
>  > >  said:
>  > >
>  > > > This seems to work for me:
>  > >  >
>  > >  > > set.seed(1)
>  > >  > > df1 <- data.frame(v1=factor(sample(1:4,20,TRUE)), v2=factor(sample(1:3,20,TRUE)), v3=sample(1:3,20,TRUE))
>  > >  > > df2 <- data.frame(v1=factor(sample(1:2,20,TRUE)), v2=factor(sample(1:2,20,TRUE)), v3=sample(1:2,20,TRUE))
>  > >  > > subset(df1, (df1$v1 %in% df2$v1) & (df1$v2 %in% df2$v2) & (df1$v3 %in% df2$v3))
>  > >  >    v1 v2 v3
>  > >  > 2   2  1  2
>  > >  > 5   1  1  2
>  > >  > 11  1  2  2
>  > >  > 14  2  1  1
>  > >  > >
>  > >  >
>  > >  > Exactly what problems are you having?  A sample of your actual data
>  > >  > would be useful.
>  > >  >
>  > >  > On Sat, May 10, 2008 at 6:31 PM,  <partofy at inoutbox.com> wrote:
>  > >  > > Dear list:
>  > >  > >
>  > >  > > I have two dataframes, say dat1 and dat2. Each has several variables but
>  > >  > > 3 of each are common in both, (say v1, v2 and v3). v1 and v2 are
>  > >  > > factores while v3 is numeric. Now, I need a subset to extract the rows
>  > >  > > in which v1, v2 and v3 are the same in both dataframes.
>  > >  > > I tried:
>  > >  > >
>  > >  > > subset(dat1, dat1$v1 %in% dat2$v1 & dat1$v2 %in% dat2$v2 & dat1$v3 %in%
>  > >  > > dat2$v3)
>  > >  > >
>  > >  > > I dont know why, but this is not working as I was expecting. Any
>  > >  > > suggestion to improve my code?
>  > >  > >
>  > >  > > Thanks in advance
>  > >  > >
>  > >  > > Justin
>  > >  > > --
>  > >  > >
>  > >  > >  partofy at inoutbox.com
>  > >  > >
>  > >  > > --
>  > >  > >
>  > >  > > ______________________________________________
>  > >  > > R-help at r-project.org mailing list
>  > >  > > https://stat.ethz.ch/mailman/listinfo/r-help
>  > >  > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>  > >  > > and provide commented, minimal, self-contained, reproducible code.
>  > >  > >
>  > >  >
>  > >  >
>  > >  >
>  > >  > --
>  > >  > Jim Holtman
>  > >  > Cincinnati, OH
>  > >  > +1 513 646 9390
>  > >  >
>  > >  > What is the problem you are trying to solve?
>  > >
>  > > --
>  > >
>  > >
>  > >   partofy at inoutbox.com
>  > >
>  > >  --
>  > >
>  > >  ______________________________________________
>  > >  R-help at r-project.org mailing list
>  > >  https://stat.ethz.ch/mailman/listinfo/r-help
>  > >  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>  > >  and provide commented, minimal, self-contained, reproducible code.
>  > >
>  > >
>  > >
>  >
>  >
>  > --
>  > Zhuanshi He / Z. He (PhD)
>  > Waterloo Centre for Atmospheric Sciences (WCAS)
>  > Department of Earth and Environmental Sciences
>  > Phy Bldg, Rm 2022
>  > University of Waterloo,
>  > Waterloo, ON N2L 3G1
>  > Canada
>  > Tel: +1-519-888-4567 ext 38053        FAX: +1-519-746-0435
>
> --
>
>   partofy at inoutbox.com
>
>
>  --
>  http://www.fastmail.fm - Send your email first class
>
>


-- 
Zhuanshi He / Z. He (PhD)
Waterloo Centre for Atmospheric Sciences (WCAS)
Department of Earth and Environmental Sciences
Phy Bldg, Rm 2022
University of Waterloo,
Waterloo, ON N2L 3G1
Canada
Tel: +1-519-888-4567 ext 38053        FAX: +1-519-746-0435



More information about the R-help mailing list