[R] Omitting repeated occurrence in a string

David Winsemius dwinsemius at comcast.net
Wed Feb 6 20:59:27 CET 2013


On Feb 6, 2013, at 11:24 AM, David Winsemius wrote:

> 
> On Feb 6, 2013, at 8:46 AM, Christofer Bogaso wrote:
> 
>> Hello again,
>> 
>> I was looking for some way on How to delete repeated appearance in a
>> String. Let say I have following string:
>> 
>> Text <- "ahsgdvasgAbcabcsdahj"
>> 
>> Here you see "Abc" appears twice. But I want to keep only 1
>> occurrence. Therefore I need that:
>> 
>> Text_result <- "ahsgdvasgAbcsdahj" (i.e. the first one).
>> 
>> Can somebody help me if it is possible using some R function?
> 
> This is not going to solve all possible variations of this problem, but then you proposed testing suite was rather limited, ... don't you agree?
> 
>> Text <- "ahsgdvasgAbcabcsdahabcj"
>> gsub("(abc).*(abc)", "\\1", Text, ignore.case=TRUE)
> [1] "ahsgdvasgAbcj"
> 

This gives some further variations:

> Text <- "ahsgdvasgAbcabcsdahabcj"  #adding a third instance
> gsub("(abc).*(abc)", "\\1", Text, ignore.case=TRUE)
[1] "ahsgdvasgAbcj"
# The first strategy deletes everything  up to and through the last 'abc'


> gsub("(abc)((.*)(abc))", "\\1\\2", Text, ignore.case=TRUE)
[1] "ahsgdvasgAbcabcsdahabcj"
# embedded parenthesies don't seem to "work"

> gsub("(abc)(abc)", "\\1", Text, ignore.case=TRUE)
[1] "ahsgdvasgAbcsdahabcj"
Gets rid of first of sequential instances only.


> Text
[1] "ahsgdvasgAbcabcsdahabcj"
> gsub("(abc)(.?)(abc)", "\\1\\2", Text, ignore.case=TRUE)
[1] "ahsgdvasgAbcsdahabcj"
# Only gets rid of first repeat
> 
#This gets rid of all of sequential repeats but not separated ones
> Text <- "ahsgdvasgAbcabcabcabcabcsdahabcj"
> gsub("(abc)(abc)*", "\\1", Text, ignore.case=TRUE)
[1] "ahsgdvasgAbcsdahabcj"


> 
> 
> -- 
> David Winsemius
> Alameda, CA, USA
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list