Segmentation of short text sequences - like hashtags - into the separated words sequence, done with the use of dictionary, which may be built on custom corpus of texts. Unigram dictionary is used to find most probable sequence, and n-grams approach is used to determine possible segmentation given the text corpus.
| Version: | 0.1.0 | 
| Depends: | R (≥ 3.5) | 
| Imports: | dplyr, magrittr, Rcpp, stringr, text2vec, textclean, utils | 
| LinkingTo: | BH, Rcpp | 
| Suggests: | testthat (≥ 3.0.0) | 
| Published: | 2024-08-19 | 
| DOI: | 10.32614/CRAN.package.NUSS | 
| Author: | Oskar Kosch | 
| Maintainer: | Oskar Kosch <contact at oskarkosch.com> | 
| BugReports: | https://github.com/theogrost/NUSS/issues | 
| License: | GPL (≥ 3) | 
| URL: | https://github.com/theogrost/NUSS | 
| NeedsCompilation: | yes | 
| Language: | en | 
| Materials: | README | 
| CRAN checks: | NUSS results | 
| Reference manual: | NUSS.html , NUSS.pdf | 
| Package source: | NUSS_0.1.0.tar.gz | 
| Windows binaries: | r-devel: NUSS_0.1.0.zip, r-release: NUSS_0.1.0.zip, r-oldrel: NUSS_0.1.0.zip | 
| macOS binaries: | r-release (arm64): NUSS_0.1.0.tgz, r-oldrel (arm64): NUSS_0.1.0.tgz, r-release (x86_64): NUSS_0.1.0.tgz, r-oldrel (x86_64): NUSS_0.1.0.tgz | 
Please use the canonical form https://CRAN.R-project.org/package=NUSS to link to this page.