[R] text matching and substitution

Wacek Kusnierczyk Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Sat Mar 28 22:49:17 CET 2009


Stephan Kolassa wrote:
> Hi Simeon,
>
> I played around a little with Vectorize and mapply, but I couldn't
> make it work :-( So, my best guess would be a simple loop like this:
>
> result <- as.character(paste(letters,colours(),"stuff",LETTERS))
> target <- c("red","blue","green","gray")
> for ( new.color in target ) { result[grep(new.color,result)] <-
> new.color }
>
> Best of luck,
> Stephan
>
>
> simeon duckworth schrieb:
>> stephan
>>
>> sorry for not being clear - but thats exactly what i want.
>>
>> i'd like to replace every complex string that contains "red" with just
>> "red", and then so on with "blue", "yellow" etc
>>
>> my data is of the form
>>
>> "xxxxx xx xx xxxxx  red xx xxx xx"
>> "xx xxx xxx xx  blue xx xx xx xx x"
>> "x xx xxxxxxxx xx xx xx xxxx red"
>> "red xx xx xx xx xx"
>> "xx xx xx xx xx xx"
>> "xx x x x x xxxx"
>>
>> which i'd like to replace with
>> "red"
>> "blue"
>> "red"
>> "other"
>> "other"
>>

if you have a fixed collection of strings (here, colour names) that you
want to recognize within a string and use as a replacement, here's one
other way to do it:

    # some dummy data ...
    colors = sample(colors(), 10)
    data = replicate(10,
       paste(sep=' ',
          paste(sample(letters, sample(10, 1)), collapse=''),
          sample(colors, 1),
          paste(sample(letters, sample(10, 1)), collapse='')))

    # ... and the actual solution
    output = sub(perl=TRUE,
        x=data,
        pattern=sprintf('.*?(%s).*', paste(colors, collapse='|')),
        replacement='\\1')

this will solve the problem as you state it.

vQ




More information about the R-help mailing list