[R] One-to-one matching?

Charles C. Berry cberry at tajo.ucsd.edu
Mon Jun 23 05:30:03 CEST 2008



Or use make.unique() in place of apseq()

On Sun, 22 Jun 2008, Gabor Grothendieck wrote:

> Try this.  apseq() sorts the input and appends a
> sequence number: 0, 1, ... to successive
> occurrences of each value.  Apply that to both
> vectors transforms it into a problem that works
> with ordinary match:
>
>> lookupTable <- c("a", "a","b","c","d","e","f")
>> matchSample <- c("a", "a","a","b","d")
>>
>> # sort and append sequence no
>> apseq <- function(x) {
> + x <- sort(x)
> + s <- cumsum(!duplicated(x))
> + paste(x, seq(s) - match(s, s))
> + }
>>
>> match(apseq(matchSample), apseq(lookupTable))
> [1]  1  2 NA  3  5
>
>
> On Sun, Jun 22, 2008 at 10:57 PM,  <Alec.Zwart at csiro.au> wrote:
>> Hi folks,
>>
>> Can anyone suggest an efficient way to do "matching without
>> replacement", or "one-to-one matching"?  pmatch() doesn't quite provide
>> what I need...
>>
>> For example,
>>
>> lookupTable <- c("a","b","c","d","e","f")
>> matchSample <- c("a","a","b","d")
>> ##Normal match() behaviour:
>> match(matchSample,lookupTable)
>> [1] 1 1 2 4
>>
>> My problem here is that both "a"s in matchSample are matched to the same
>> "a" in the lookup table.  I need the elements of the lookup table to be
>> excluded from the table as they are matched, so that no match can be
>> found for the second "a".
>>
>> Function pmatch() comes close to what I need:
>>
>> pmatch(matchSample,lookupTable)
>> [1] 1 NA 2 4
>>
>> Yep!  However, pmatch() incorporates partial matching, which I
>> definitely don't want:
>>
>> lookupTable <- c("a","b","c","d","e","aaaaaaaaf")
>> matchSample <- c("a","a","b","d")
>> pmatch(matchSample,lookupTable)
>> [1] 1 6 2 4
>> ## i.e. the second "a", matches "aaaaaaaaf" - I don't want this.
>>
>> Of course, when identical items ARE duplicated in both sample and lookup
>> table, I need the matching to reflect this:
>>
>> lookupTable <- c("a","a","c","d","e","f")
>> matchSample <- c("a","a","c","d")
>> ##Normal match() behaviour
>> match(matchSample,lookupTable)
>> [1] 1 1 3 4
>>
>> No good - pmatch() is better:
>>
>> lookupTable <- c("a","a","c","d","e","f")
>> matchSample <- c("a","a","c","d")
>> pmatch(matchSample,lookupTable)
>> [1] 1 2 3 4
>>
>> ...but we still have the partial matching issue...
>>
>> ##And of course, as per the usual behaviour of match(), sample elements
>> missing from the lookup table should return NA:
>>
>> matchSample <- c("a","frog","e","d") ; print(matchSample)
>> match(matchSample,lookupTable)
>>
>> Is there a nifty way to get what I'm after without resorting to a for
>> loop? (my code's already got too blasted many of those...)
>>
>> Thanks,
>>
>> Alec Zwart
>> CMIS CSIRO
>> alec.zwart at csiro.au
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901



More information about the R-help mailing list