[R] regexpr - ignore all special characters and punctuation in a string

Charles Determan cdetermanjr at gmail.com
Mon Apr 20 16:15:18 CEST 2015


You can use the [:alnum:] regex class with gsub.

str1 <- "What a nice day today! - Story of happiness: Part 2."
str2 <- "What a nice day today: Story of happiness (Part 2)"

gsub("[^[:alnum:]]", "", str1) == gsub("[^[:alnum:]]", "", str2)
[1] TRUE

The same can be done with the stringr package if you really are partial to
it.

library(stringr)





On Mon, Apr 20, 2015 at 9:10 AM, Sven E. Templer <sven.templer at gmail.com>
wrote:

> Hi Dimitri,
>
> str_replace_all is not in the base libraries, you could use 'gsub' as well,
> for example:
>
> a = "What a nice day today! - Story of happiness: Part 2."
> b = "What a nice day today: Story of happiness (Part 2)"
> sa = gsub("[^A-Za-z0-9]", "", a)
> sb = gsub("[^A-Za-z0-9]", "", b)
> a==b
> # [1] FALSE
> sa==sb
> # [1] TRUE
>
> Take care of the extra space in a after the '-', so also replace spaces...
>
> Best,
> Sven.
>
> On 20 April 2015 at 16:05, Dimitri Liakhovitski <
> dimitri.liakhovitski at gmail.com> wrote:
>
> > I think I found a partial answer:
> >
> > str_replace_all(x, "[[:punct:]]", " ")
> >
> > On Mon, Apr 20, 2015 at 9:59 AM, Dimitri Liakhovitski
> > <dimitri.liakhovitski at gmail.com> wrote:
> > > Hello!
> > >
> > > Please point me in the right direction.
> > > I need to match 2 strings, but focusing ONLY on characters, ignoring
> > > all special characters and punctuation signs, including (), "", etc..
> > >
> > > For example:
> > > I want the following to return: TRUE
> > >
> > > "What a nice day today! - Story of happiness: Part 2." ==
> > >    "What a nice day today: Story of happiness (Part 2)"
> > >
> > >
> > > --
> > > Thank you!
> > > Dimitri Liakhovitski
> >
> >
> >
> > --
> > Dimitri Liakhovitski
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list