[R] Pattern Matching within Vector?

Anne-Marie Ternes amternes at gmail.com
Mon Sep 21 17:07:07 CEST 2009


Dear mailing list,

I'm stuck with a tricky problem here - at least it seems tricky to me,
being not really talented in pattern matching and regex matters.

I'm analysing amino acid mutations by position and type of mutation.
E.g. (fictitious example) in position 92, I can find L92V, L92MV,
L92I... L is in this example the wild-type amino-acid, and everything
behind the position number is a mutation (single amino acid or
mixture). I'm only interested in the mutation information, so:

Say I've got this vector:
bla -> c("V", "MV", "I", "IL", "PT", "M", "E", "OM")

I'd like to count only those elements that are "truly unique"
mutations, i.e.count "V", "MV" as 1, "I", "IL" as 1, "PT" as 1, "M" as
1, "E" as 1, not count "OM".

I could do it iteratively:
Element 1: V. Keep.
Element 2: MV. Match Keep vs New -> 1. I got already a V, so don't count.
Element 3: I. Match Keep vs New -> 0. I is new, keep. Keep = V,I
Element 4: IL. Match Keep vs New -> 1. I got already an I, so don't count.
Element 5: PT. Match Keep vs New -> 0. PT is new, keep. Keep = V,I,PT
Element 6: M: Match Keep vs New -> 0. M is new, keep. Keep = V,I,PT,M
Element 7: E. Match Keep vs New -> 0. E is new, keep. Keep = V,I,PT,M,E
Element 8: OM. Match Keep vs New -> 1. I got already M, so don't count.

Keep vector= (V,I,PT,M,E), count =5

OK. There must be a more elegant way to do this! Something with
vector-wise pattern matching or so?... By the way, I dont care e.g.
which of "V" or "MV" is counted, what is important is that they are
only counted as 1.

Thanks for your help!

Anne-Marie




More information about the R-help mailing list