[R] stringi behaves differently in 2 similar situations

Sarah Goslee sarah.goslee at gmail.com
Wed Nov 30 22:27:30 CET 2016


A dot is treated differently if it has a number on no, one, or both sides.

> stri_extract_all_words("me.com", simplify = TRUE)
     [,1]
[1,] "me.com"
> stri_extract_all_words("me1.com", simplify = TRUE)
     [,1]  [,2]
[1,] "me1" "com"
> stri_extract_all_words("me1.2com", simplify = TRUE)
     [,1]
[1,] "me1.2com"

?stri_extract_all_words

sent me to

?"stringi-search-boundaries"

which suggests that you should spend some time with the user guide:

     _Boundary Analysis_ - ICU User Guide, <URL:
     http://userguide.icu-project.org/boundaryanalysis>


Depending on your objective, you might be better off with strsplit()
separating on whitespace.

Sarah

On Wed, Nov 30, 2016 at 3:51 PM, Dimitri Liakhovitski
<dimitri.liakhovitski at gmail.com> wrote:
> Hello!
>
> library(stringi)
>
> stri_extract_all_words("me.com", simplify = TRUE)         # returns with a dot
> stri_extract_all_words("watch32.com", simplify = TRUE)  # removes the dot
>
> Why is the dot removed only in the second case?
> How is it possible to ask it NOT to remove the dot in the second case?
>
> Thanks a lot!
>

-- 
Sarah Goslee
http://www.functionaldiversity.org



More information about the R-help mailing list