[R] Matching values between 2 data.frame.

William Dunlap wdunlap at tibco.com
Sat May 20 22:54:34 CEST 2017


merge() may be useful here:

> merge(OriginalData[1:3], TargetValue, by.x="AA1", by.y="AA",
sort=FALSE)[-1]
   Value1 Value2 BB Value
1       1     11  B     7
2       3     13  B     7
3      11     21  B     7
4       2     12  B    25
5      12     22  B    25
6       9     19  B    25
7      10     20  B    25
8       4     14  B     5
9       6     16  B     5
10      5     15  B    10
11      7     17 CC    21
12      8     18 CC    21

Rename the columns as desired.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Sat, May 20, 2017 at 1:13 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:

> Like this?  (use indexing to avoid explicit loops whenever possible):
>
> ## first convert factor columns to character, as David W. suggested
> i <- sapply(od,is.factor)
> od[i]<- lapply(od[i],as.character)
> i <- sapply(tv, is.factor)
> tv[i]<- lapply(tv[i],as.character)
>
>
> ## Now use ?match
> wh  <-   match(od[,"AA1"], tv[,"AA"])
> matched <- !is.na(wh)  ## only needed if not all AA1's match in AA
> od[matched,c("AA1", "Value")] <- tv[wh[matched],c("BB","Value")]
>
> > od
>    Value1 Value2 AA1 Value
> 1       1     11   B     7
> 2       2     12   B    25
> 3       3     13   B     7
> 4       4     14   B     5
> 5       5     15   B    10
> 6       6     16   B     5
> 7       7     17  CC    21
> 8       8     18  CC    21
> 9       9     19   B    25
> 10     10     20   B    25
> 11     11     21   B     7
> 12     12     22   B    25
>
>
> Cheers,
> Bert
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Sat, May 20, 2017 at 11:53 AM, David Winsemius
> <dwinsemius at comcast.net> wrote:
> >
> >> On May 20, 2017, at 11:23 AM, Christofer Bogaso <
> bogaso.christofer at gmail.com> wrote:
> >>
> >> Hi again,
> >>
> >> Let say I have below 2 data frames.
> >>
> >> OriginalData = data.frame('Value1' = 1:12, 'Value2' = 11:22, 'AA1' =
> >> c('AA4', 'AA3', 'AA4', 'AA1', 'AA2', 'AA1', 'AA6', 'AA6', 'AA3',
> >> 'AA3', 'AA4', 'AA3'), 'Value' = NA)
> >>
> >> TargetValue = data.frame('AA' = c('AA1', 'AA2', 'AA3', 'AA4', 'AA5',
> >> 'AA6'), 'BB' = c('B', 'B', 'B', 'B', 'CC', 'CC'), 'Value' = c(5, 10,
> >> 25, 7, 35, 21))
> >>
> >> OriginalData
> >> TargetValue
> >>
> >> Now I need to replace OriginalData's 'AA1' column with TargetValue's
> >> 'BB' column, based on matched values between 'AA1' & 'AA' columns of
> >> OriginalData & TargetValue respectively. With this same law, I need to
> >> update 'Value' column of  OriginalData with that of TargetValue.
> >>
> >> As an example, after replacement by above rule, 1st row of
> >> OriginalData should look like :
> >>
> >>> OriginalData
> >>
> >>   Value1 Value2 AA1 Value
> >>
> >> 1       1     11 B    7
> >>
> >> Values of TargetValue's 'AA' column are unique i.e. no duplication
> >>
> >> Previously I have implemented a 'for' loop to implement above, however
> >> since both of my data.frames are quite big, it is taking long time to
> >> execute. Is there any 'R' way to implement this quickly.
> >>
> >> Appreciate for any pointer.
> >
> > It's going to have a greater chance of delivering the desired result if
> you convert the factor columns into character.
> >
> >>
> >> Thanks,
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > David Winsemius
> > Alameda, CA, USA
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list