[R] remove Punctuation characters

Marc Schwartz (via MN) mschwartz at mn.rr.com
Tue May 9 18:00:41 CEST 2006


On Tue, 2006-05-09 at 16:50 +0100, Filipe Almeida wrote:
> Hi,
> 
> I want to remove all punctuation characters in a string. I was trying it use
> a regular expressions but it doesn't work.
> Here is a sample os what i want:
> 
> str <- 'ABD - remove de punct, and dot characters.'
> str <- gsub('[:punct:]','',str)
> str
> "'ABD remove de punct and dot characters"
> 
> is there any function that do this kind of thing?
> 
> Thanks to all.
> 
> Filipe Almeida

You almost have it.  Just need to double the brackets:

> str
[1] "ABD - remove de punct, and dot characters."

> gsub("[[:punct:]]", "", str)
[1] "ABD  remove de punct and dot characters"


Note the following in ?regex:

For example, [[:alnum:]] means [0-9A-Za-z], except the latter depends
upon the locale and the character encoding, whereas the former is
independent of locale and character set. (Note that the brackets in
these class names are part of the symbolic names, and must be included
in addition to the brackets delimiting the bracket list.) Most
metacharacters lose their special meaning inside lists. To include a
literal ], place it first in the list. Similarly, to include a literal
^, place it anywhere but first. Finally, to include a literal -, place
it first or last. (Only these and \ remain special inside character
classes.)

HTH,

Marc Schwartz




More information about the R-help mailing list