[R] Ids with matching number combinations?

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Fri Oct 7 18:19:16 CEST 2022


Well, for a start, you might give us a reproducible example that actually
runs -- yours doesn't. Did you check? You seem to b.e missing a final
")"(Also, you do not need to quote the column names in data.frame(), though
it works fine also if you do).

Also note that in df, your id column has length 1000 and item1 column has
length 100 and will be replicated to match id. This seems to me likely to
be misspecified. Or is this what you meant?

Finally, note that that none of the rows in df2 have identical numbers, so
there is no id in df that can match both members of the row. So did you
mean that the 'id' and 'item1' value in a row of df must match the
corresponding 'a' and 'b' values of some row in df2 ?

Under the above interpretation and with the following reprex(note the use
of set.seed() to make it reproducible)...

set.seed(1234)  ## for reproducibility
df=data.frame(id=c(10,rep(1:10, each=10), item1=sample(1:20, 100,
replace=T))
df2=data.frame(a=c(8, 8,10,9, 5, 1,2,1), b=c(16,18,11, 19,18, 11,17,12))

... you can paste0() the column vectors together in each data frame(as
character values) and then just match on the single character vectors, like
this:

both <- match(do.call(paste0,df),  do.call(paste0, df2))
## subscript the data frames if you have more columns that are not used for
matching

> both
  [1] NA NA  8 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [21] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [41] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [61] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [81] NA NA NA NA NA NA NA NA NA NA  3 NA NA NA NA NA NA NA NA  3

Then
> both[!is.na(both)] ## the rows of df2 that matched
[1] 8 3 3

> which(!is.na(both)) ## the rows of df that were matched
[1]   3  91 100

This should be very efficient, as hashing is used for matching.

I think there is a slicker way to do this that combines the paste and match
functions together (other than using merge() or the like, as has already
been suggested). But I have forgotten the details or I may just be thinking
of merge(), which may indeed be a better option.

All presuming this is what you meant, of course, which it may not be. :-(

Cheers,
Bert









On Fri, Oct 7, 2022 at 5:57 AM Marine Andersson <marine.andersson using ki.se>
wrote:

> Hi,
>
> If I have two datasets like this:
> df=data.frame("id"=rep(1:10,10, each=10), "item1"=sample(1:20, 100,
> replace=T)
> df2=data.frame("a"=c(8, 8,10,9, 5, 1,2,1), "b"=c(16,18,11, 19,18,
> 11,17,12))
>
> How do I find out which ids in the df dataset that has a match for both
> the numbers occuring in the same row in the df2 dataframe? In the output I
> would like to get the matching id and the rownumber from the df2.
>
> Output something like this
> Id                        Rownr
> 2                         1
> 5                         1
> 7                         4
>
> My actual problem is more complex with even more columns to be matched and
> the datasets are large, hence the solution needs to be efficient.
>
> Kind regards,
>
>
>
>
>
> N?r du skickar e-post till Karolinska Institutet (KI) inneb?r detta att KI
> kommer att behandla dina personuppgifter. H?r finns information om hur KI
> behandlar personuppgifter<
> https://ki.se/medarbetare/integritetsskyddspolicy>.
>
>
> Sending email to Karolinska Institutet (KI) will result in KI processing
> your personal data. You can read more about KI's processing of personal
> data here<https://ki.se/en/staff/data-protection-policy>.
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list