[R] Unicode Text Segmentation Algorithms already implemented in R?
wolfer at ids-mannheim.de
Thu Mar 3 10:47:02 CET 2016
Hello list members,
I am looking for an implementation of Unicode text segmentation (word boundary detection) algorithms in R. You can find information about the algorithms here: http://www.unicode.org/reports/tr29/#Word_Boundaries
The help page for the function ‚casefuns‘ from the excellent ‚Unicode‘ package says: "Other methods will be added eventually (once the Unicode text segmentation algorithm is implemented for detecting word boundaries).“ My simple question is: Are these algorithms already implemented in an R package? I didn’t find anything on the web, but I am counting on the power of this list. My Stata-using colleague is already picking at me… (in Stata, the function ’ustrword’ does exactly what I want to do in R).
Thanks for your help, have a good day, you all!
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
More information about the R-help