[R] test if elements of a character vector contain letters

David L Carlson dcarlson at tamu.edu
Mon Aug 6 19:39:06 CEST 2012


Only an extra set of brackets:

is.letter <- function(x) grepl("[[:alpha:]]", x)
is.number <- function(x) grepl("[[:digit:]]", x)

Without them, the functions are fast, but wrong.

> x
 [1] "a8"  "b5"  "c10" "d1"  "e6"  "f2"  "g4"  "h3"  "i7"  "j9"  "k"   "l"  
[13] "m"   "n"   "o"   "p"   "q"   "r"   "s"   "t"   "u"   "v"   "w"   "x"  
[25] "y"   "z"   "1"   "2"   "3"   "4"   "5"   "6"   "7"   "8"   "9"   "10" 
[37] "11"  "12"  "13"  "14"  "15"  "16"  "17"  "18"  "19"  "20"  "21"  "22" 
[49] "23"  "24"  "25"  "26" 
> is.letter <- function(x) grepl("[:alpha:]", x)
> is.letter(x)
 [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE
[13] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49] FALSE FALSE FALSE FALSE
> is.letter <- function(x) grepl("[[:alpha:]]", x)
> is.letter(x)
 [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[13]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[25]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49] FALSE FALSE FALSE FALSE 

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Marc Schwartz
> Sent: Monday, August 06, 2012 12:07 PM
> To: Rui Barradas
> Cc: r-help
> Subject: Re: [R] test if elements of a character vector contain letters
> 
> Perhaps I am missing something, but why use sapply() when grepl() is
> already vectorized?
> 
> is.letter <- function(x) grepl("[:alpha:]", x)
> is.number <- function(x) grepl("[:digit:]", x)
> 
> x <- c(letters, 1:26)
> 
> x[1:10] <- paste(x[1:10], sample(1:10, 10), sep='')
> 
> x <- rep(x, 1e3)
> 
> > str(x)
>  chr [1:52000] "a2" "b10" "c8" "d3" "e6" "f1" "g5" ...
> 
> > system.time(is.letter(x))
>    user  system elapsed
>   0.011   0.000   0.010
> 
> > system.time(is.number(x))
>    user  system elapsed
>   0.010   0.000   0.011
> 
> 
> Regards,
> 
> Marc Schwartz
> 
> On Aug 6, 2012, at 11:51 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
> 
> > Hello,
> >
> > Fun as an exercise in vectorization. 30 times faster. Don't look,
> guess.
> >
> > Gave it up? Ok, here it is.
> >
> >
> > is_letter <- function(x, pattern=c(letters, LETTERS)){
> >    sapply(x, function(y){
> >        any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
> >    })
> > }
> > # test ascii codes, just one loop.
> > has_letter <- function(x){
> >    sapply(x, function(y){
> >        y <- as.integer(charToRaw(y))
> >        any((65 <= y & y <= 90) | (97 <= y & y <= 122))
> >    })
> > }
> >
> > x <- c(letters, 1:26)
> > x[1:10] <- paste(x[1:10], sample(1:10, 10), sep='')
> > x <- rep(x, 1e3)
> >
> > t1 <- system.time(is_letter(x))
> > t2 <- system.time(has_letter(x))
> > rbind(t1, t2, t1/t2)
> >   user.self sys.self elapsed user.child sys.child
> > t1     15.69        0   15.74         NA        NA
> > t2      0.50        0    0.50         NA        NA
> >       31.38      NaN   31.48         NA        NA
> >
> >
> > Em 06-08-2012 17:25, Liviu Andronic escreveu:
> >> Dear all
> >> I'm pretty sure that I'm approaching the problem in a wrong way.
> >> Suppose the following character vector:
> >>> (x[1:10] <- paste(x[1:10], sample(1:10, 10), sep=''))
> >>  [1] "a10" "b7"  "c2"  "d3"  "e6"  "f1"  "g5"  "h8"  "i9"  "j4"
> >>> x
> >>  [1] "a10" "b7"  "c2"  "d3"  "e6"  "f1"  "g5"  "h8"  "i9"  "j4"  "k"
> >> "l"   "m"   "n"
> >> [15] "o"   "p"   "q"   "r"   "s"   "t"   "u"   "v"   "w"   "x"   "y"
> >> "z"   "1"   "2"
> >> [29] "3"   "4"   "5"   "6"   "7"   "8"   "9"   "10"  "11"  "12"
> "13"
> >> "14"  "15"  "16"
> >> [43] "17"  "18"  "19"  "20"  "21"  "22"  "23"  "24"  "25"  "26"
> >>
> >>
> >> How do you test whether the elements of the vector contain at least
> >> one letter (or at least one digit) and obtain a logical vector of
> the
> >> same dimension? I came up with the following awkward function:
> >> is_letter <- function(x, pattern=c(letters, LETTERS)){
> >>     sapply(x, function(y){
> >>         any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
> >>     })
> >> }
> >>
> >>> is_letter(x)
> >>   a10    b7    c2    d3    e6    f1    g5    h8    i9    j4     k
> >> l     m     n     o
> >>  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
> >> TRUE  TRUE  TRUE  TRUE
> >>     p     q     r     s     t     u     v     w     x     y     z
> >> 1     2     3     4
> >>  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
> >> FALSE FALSE FALSE FALSE
> >>     5     6     7     8     9    10    11    12    13    14    15
> >> 16    17    18    19
> >> FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> >> FALSE FALSE FALSE FALSE
> >>    20    21    22    23    24    25    26
> >> FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> >>> is_letter(x, 0:9)  ##function slightly misnamed
> >>   a10    b7    c2    d3    e6    f1    g5    h8    i9    j4     k
> >> l     m     n     o
> >>  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE
> >> FALSE FALSE FALSE FALSE
> >>     p     q     r     s     t     u     v     w     x     y     z
> >> 1     2     3     4
> >> FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> >> TRUE  TRUE  TRUE  TRUE
> >>     5     6     7     8     9    10    11    12    13    14    15
> >> 16    17    18    19
> >>  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
> >> TRUE  TRUE  TRUE  TRUE
> >>    20    21    22    23    24    25    26
> >>  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
> >>
> >>
> >> Is there a nicer way to do this? Regards
> >> Liviu
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list