[R] text matching and substitution

jim holtman jholtman at gmail.com
Sat Mar 28 18:27:09 CET 2009


Does this do what you want:

> x <- c('xxxredxxx', 'xxxxbluexxxx', 'xxxxxx', 'greenbluered')
> pat <- 'red|green|blue'
> result <- sub(paste("^.*?(", pat, ").*", sep=""), "\\1", x)
> # check if no match in original string; replace with 'other'
> match <- grep(pat, x)
> result[-match] <- 'other'
> result
[1] "red"   "blue"  "other" "red"
>
>


On Sat, Mar 28, 2009 at 1:08 PM, Stephan Kolassa <Stephan.Kolassa at gmx.de> wrote:
> Hi Simeon,
>
> I played around a little with Vectorize and mapply, but I couldn't make it
> work :-( So, my best guess would be a simple loop like this:
>
> result <- as.character(paste(letters,colours(),"stuff",LETTERS))
> target <- c("red","blue","green","gray")
> for ( new.color in target ) { result[grep(new.color,result)] <- new.color }
>
> Best of luck,
> Stephan
>
>
> simeon duckworth schrieb:
>>
>> stephan
>>
>> sorry for not being clear - but thats exactly what i want.
>>
>> i'd like to replace every complex string that contains "red" with just
>> "red", and then so on with "blue", "yellow" etc
>>
>> my data is of the form
>>
>> "xxxxx xx xx xxxxx  red xx xxx xx"
>> "xx xxx xxx xx  blue xx xx xx xx x"
>> "x xx xxxxxxxx xx xx xx xxxx red"
>> "red xx xx xx xx xx"
>> "xx xx xx xx xx xx"
>> "xx x x x x xxxx"
>>
>> which i'd like to replace with
>> "red"
>> "blue"
>> "red"
>> "other"
>> "other"
>>
>> thanks
>>
>>
>> On Sat, Mar 28, 2009 at 2:38 PM, Stephan Kolassa
>> <Stephan.Kolassa at gmx.de>wrote:
>>
>>> Hi Simeon,
>>>
>>> I'm slightly unclear on what exactly you are trying to achieve... Are you
>>> trying to replace every entry of colours which *contains* "red" by "red",
>>> dropping the rest of the entry? And same with "blue"?
>>>
>>> A short example "before & after" would be helpful...
>>>
>>> Best,
>>> Stephan
>>>
>>>
>>> simeon duckworth schrieb:
>>>
>>>  thanks stephan.  i'd been trying to make gsub work, but couldnt make it
>>>>
>>>> replace the whole expression.  so i'd resorted to trying to loop with
>>>> grep
>>>> -
>>>> but with two problems.   firstly, i cant seem to make the loop
>>>> 'remember'
>>>> the substitutions it makes (see below).  secondly, it feels like this is
>>>> a
>>>> really inefficient way of doing something quite simple anyhow.
>>>>
>>>> colours <- as.character(paste(letters,colours(),"stuff",LETTERS))
>>>> target <- c("red","blue","green","gray")
>>>> new.colour <-colours
>>>> for (i in length(target)) {
>>>>   x <- target[i]
>>>>   new.colour[grep((x),new.colour)] <- x
>>>>   return(new.colour)
>>>>   }
>>>>
>>>>
>>>>
>>>>
>>>> On Sat, Mar 28, 2009 at 9:45 AM, Stephan Kolassa <Stephan.Kolassa at gmx.de
>>>>>
>>>>> wrote:
>>>>
>>>>  Hi Simeon,
>>>>>
>>>>> ?gsub
>>>>>
>>>>> HTH,
>>>>> Stephan
>>>>>
>>>>> simeon duckworth schrieb:
>>>>>
>>>>>  I am trying to simplify a text variable by matching and replacing it
>>>>>>
>>>>>> with
>>>>>> a
>>>>>> string in another vector
>>>>>>
>>>>>> so for example in
>>>>>> colours <- paste(letters,colours(),"stuff",LETTERS)
>>>>>>
>>>>>> find and replace with ("red","blue","green","gray","yellow","other")
>>>>>>  -
>>>>>> irrespective of case
>>>>>>
>>>>>> its a large dataset, so i'd like to be able to do this as efficiently
>>>>>> as
>>>>>> possible.
>>>>>>
>>>>>> thanks for any help
>>>>>>
>>>>>>      [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>>
>>>>>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?




More information about the R-help mailing list