[R] need help

Rui Barradas ruipbarradas at sapo.pt
Thu Jun 14 21:36:25 CEST 2012


Hello arun,

Thinking about it, I believe this one is reasonably solid.
I've added a 'txt0' just in case it wouldn't like shorter.


txt0 <- "my name name is micky"
txt1 <- "my name name name is micky"
txt2 <- "my name name name name is micky"

pat <- "(\\w+\\s)\\1+"

gsub(pat, "\\1", txt0)
gsub(pat, "\\1", txt1)
gsub(pat, "\\1", txt2)


Rui Barradas

Em 14-06-2012 17:32, arun escreveu:
>
>
>
>
> Hi Carlos,
>
> Thanks for your suggestions.  I saw Rui's reply about the same problem using rle.  It looks very solid.  I was trying replicate the same thing with "gsub", but it was not working in that way.
>
> For example,
>   txt1<-"my name name name is micky"
>   gsub("\\b(\\w+)\\b(\\s+)\\1\\2","",txt1)
> [1] "my name is micky"
>   txt2<-"my name name name name is micky"
>   gsub("\\b(\\w+)\\b(\\s+)\\1\\2","",txt2)
> [1] "my is micky"
>
>
> I still think there must be a way in gsub to make it more general.
>
> A.K.
>
>
>
>
>
>
> ________________________________
> From: Carlos Ortega <cof at qualityexcellence.es>
> To: arun <smartpink111 at yahoo.com>
> Sent: Thursday, June 14, 2012 12:11 PM
> Subject: Re: [R] need help
>
>
> Hi,
>
> The way to make it very general and independant of the string "name" can be done as follows:
>
> a) For every sentece make a "table()" and get the word with the highest number of occurences.
> b) With that word you can follow the procedure I gave you.
>
> This sequence is not free of possible errors, it can capture prepositions (they appears many times in every sentence) so you will have to make the algorithm a little bit more complex, like besides getting the occurences of every word, getting their lenghts too and only change those words with the highest number of occurences and highest lengths... but again is not free of possible errors...
>
> Despite regex offers a lot of possibilities, no doubt, I prefer the simplicity stringr offers..
>
> Regards,
> Carlos Ortega
> www.qualityexcellence.es
>
>
> 2012/6/14 arun <smartpink111 at yahoo.com>
>
> Hi,
>>
>> For the example you gave, the regex below works:
>>> txt1<-"my name name name is micky"
>>
>>> gsub("\\b(\\w+)\\b(\\s+)\\1\\2","",txt1)
>> [1] "my name is micky"
>>
>> But, the expression is not a generalized one.
>> A.K.
>>
>>
>>
>>
>>
>>
>> ----- Original Message -----
>> From: shilpa rai <raishilpa.bhu at gmail.com>
>> To: r-help at r-project.org
>> Cc:
>> Sent: Wednesday, June 13, 2012 6:56 AM
>> Subject: [R] need help
>>
>> hello
>> could you help in solving the following problem
>>
>> I want to replace same consecutive words by a single word in a sentence..
>>
>> for example --- my name name name is micky
>> so I want the output like this--my name is micky
>>
>> I want this solution for a text file
>>
>> can you tell me the code for it??
>>
>> thanking you in anticipation
>>
>> --
>> Shilpa Rai
>> MSc.(2011-2013)
>> Applied Statistics and Informatics
>> Indian Institute of Technology,Bombay
>>
>>      [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>



More information about the R-help mailing list