[R] stringi behaves differently in 2 similar situations
sarah.goslee at gmail.com
Wed Nov 30 22:27:30 CET 2016
A dot is treated differently if it has a number on no, one, or both sides.
> stri_extract_all_words("me.com", simplify = TRUE)
> stri_extract_all_words("me1.com", simplify = TRUE)
[1,] "me1" "com"
> stri_extract_all_words("me1.2com", simplify = TRUE)
sent me to
which suggests that you should spend some time with the user guide:
_Boundary Analysis_ - ICU User Guide, <URL:
Depending on your objective, you might be better off with strsplit()
separating on whitespace.
On Wed, Nov 30, 2016 at 3:51 PM, Dimitri Liakhovitski
<dimitri.liakhovitski at gmail.com> wrote:
> stri_extract_all_words("me.com", simplify = TRUE) # returns with a dot
> stri_extract_all_words("watch32.com", simplify = TRUE) # removes the dot
> Why is the dot removed only in the second case?
> How is it possible to ask it NOT to remove the dot in the second case?
> Thanks a lot!
More information about the R-help