Type: | Package |
Title: | Semi-Supervised Model for Geographical Document Classification |
Version: | 0.9.2 |
Maintainer: | Kohei Watanabe <watanabe.kohei@gmail.com> |
Description: | Semissupervised model for geographical document classification (Watanabe 2018) <doi:10.1080/21670811.2017.1293487>. This package currently contains seed dictionaries in English, German, French, Spanish, Italian, Russian, Hebrew, Arabic, Turkish, Japanese and Chinese (Simplified and Traditional). |
License: | MIT + file LICENSE |
URL: | https://github.com/koheiw/newsmap |
BugReports: | https://github.com/koheiw/newsmap/issues |
LazyData: | TRUE |
Encoding: | UTF-8 |
Depends: | R (≥ 3.5), methods |
Imports: | utils, Matrix, quanteda (≥ 2.1), quanteda.textstats, stringi |
Suggests: | testthat |
Language: | en-GB |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | no |
Packaged: | 2025-07-10 08:04:17 UTC; watan |
Author: | Kohei Watanabe [aut, cre, cph], Stefan Müller [aut], Dani Madrid-Morales [aut], Katerina Tertytchnaya [aut], Ke Cheng [aut], Chung-hong Chan [aut], Claude Grasland [aut], Giuseppe Carteny [aut], Elad Segev [aut], Dai Yamao [aut], Barbara Ellynes Zucchi Nobre Silva [aut], Lanabi la Lova [aut], Lungta Seki [aut] |
Repository: | CRAN |
Date/Publication: | 2025-07-10 12:50:12 UTC |
Evaluate classification accuracy in precision and recall
Description
Evaluate classification accuracy in precision and recall
Usage
accuracy(x, y)
Arguments
x |
vector of predicted classes |
y |
vector of true classes |
Examples
class_pred <- c('US', 'GB', 'US', 'CN', 'JP', 'FR', 'CN') # prediction
class_true <- c('US', 'FR', 'US', 'CN', 'KP', 'EG', 'US') # true class
acc <- accuracy(class_pred, class_true)
print(acc)
summary(acc)
Compute average feature entropy (AFE)
Description
AFE computes randomness of occurrences features in labelled documents.
Usage
afe(x, y, smooth = 1)
Arguments
x |
a dfm for features |
y |
a dfm for labels |
smooth |
a numeric value for smoothing to include all the features |
Coerce various objects to coefficients_textmodel
This is a helper function used in summary.textmodel_*
.
Description
Coerce various objects to coefficients_textmodel
This is a helper function used in summary.textmodel_*
.
Usage
as.coefficients_textmodel(x)
Arguments
x |
an object to be coerced |
Coerce various objects to statistics_textmodel
Description
This is a helper function used in summary.textmodel_*
.
Usage
as.statistics_textmodel(x)
Arguments
x |
an object to be coerced |
Assign the summary.textmodel class to a list
Description
Assign the summary.textmodel class to a list
Usage
as.summary.textmodel(x)
Arguments
x |
a named list |
Extract coefficients for features
Description
Extract coefficients for features
Usage
## S3 method for class 'textmodel_newsmap'
coef(object, n = 10, select = NULL, ...)
## S3 method for class 'textmodel_newsmap'
coefficients(object, n = 10, select = NULL, ...)
Arguments
object |
a Newsmap model fitted by |
n |
the number of coefficients to extract. |
select |
returns the coefficients for the selected class; specify by the
names of rows in |
... |
not used. |
Seed geographical dictionary in Arabic
Description
Seed geographical dictionary in Arabic
Author(s)
Dai Yamao daiyamao@scs.kyushu-u.ac.jp
Seed geographical dictionary in German
Description
Seed geographical dictionary in German
Author(s)
Stefan Müller mullers@tcd.ie
Seed geographical dictionary in English
Description
Seed geographical dictionary in English
Author(s)
Kohei Watanabe watanabe.kohei@gmail.com
Seed geographical dictionary in Spanish
Description
Seed geographical dictionary in Spanish
Author(s)
Dani Madrid-Morales dani.madrid@my.cityu.edu.hk
Seed geographical dictionary in French
Description
Seed geographical dictionary in French
Author(s)
Claude Grasland claude.grasland@parisgeo.cnrs.fr
Seed geographical dictionary in Hebrew
Description
Seed geographical dictionary in Hebrew
Author(s)
Elad Segev eladseg@gmail.com
Seed geographical dictionary in Italian
Description
Seed geographical dictionary in Italian
Author(s)
Giuseppe Carteny giuseppe.carteny@unimi.it
Seed geographical dictionary in Japanese
Description
Seed geographical dictionary in Japanese
Author(s)
Kohei Watanabe watanabe.kohei@gmail.com
Seed geographical dictionary in Portuguese
Description
Seed geographical dictionary in Portuguese
Author(s)
Barbara Ellynes Zucchi Nobre Silva barbara@zucchi.science
Seed geographical dictionary in Russian
Description
Seed geographical dictionary in Russian
Author(s)
Katerina Tertytchnaya katerina.tertytchnaya@gmail.com
Lanabi la Lova l.lalova@lse.ac.uk
Seed geographical dictionary in Turkish
Description
Seed geographical dictionary in Turkish
Author(s)
Lungta Seki yahoo.co.jp0409@gmail.com
Seed geographical dictionary in Chinese (simplified)
Description
Seed geographical dictionary in Chinese (simplified)
Author(s)
Ke Cheng kecheng.ac@gmail.com
Seed geographical dictionary in Chinese (traditional)
Description
Seed geographical dictionary in Chinese (traditional)
Author(s)
Chung-hong Chan chainsawtiney@gmail.com
Prediction method for textmodel_newsmap
Description
Predict document class using trained a Newsmap model
Usage
## S3 method for class 'textmodel_newsmap'
predict(
object,
newdata = NULL,
confidence = FALSE,
rank = 1L,
type = c("top", "all"),
rescale = FALSE,
min_conf = -Inf,
min_n = 0L,
...
)
Arguments
object |
a fitted Newsmap textmodel. |
newdata |
dfm on which prediction should be made. |
confidence |
if |
rank |
rank of the class to be predicted. Only used when |
type |
if |
rescale |
if |
min_conf |
return |
min_n |
set the minimum number of polarity words in documents. |
... |
not used. |
Print methods for textmodel features estimates
This is a helper function used in print.summary.textmodel
.
Description
Print methods for textmodel features estimates
This is a helper function used in print.summary.textmodel
.
Usage
## S3 method for class 'coefficients_textmodel'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
Arguments
x |
a coefficients_textmodel object |
digits |
minimal number of significant digits, see
|
... |
additional arguments not used |
Implements print methods for textmodel_statistics
Description
Implements print methods for textmodel_statistics
Usage
## S3 method for class 'statistics_textmodel'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
Arguments
x |
a textmodel_wordscore_statistics object |
digits |
minimal number of significant digits, see
|
... |
further arguments passed to or from other methods |
print method for summary.textmodel
Description
print method for summary.textmodel
Usage
## S3 method for class 'summary.textmodel'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
Arguments
x |
a |
digits |
minimal number of significant digits, see
|
... |
additional arguments not used |
Calculate micro and macro average measures of accuracy
Description
This function calculates micro-average precision (p) and recall (r) and
macro-average precision (P) and recall (R) based on a confusion matrix from
accuracy()
.
Usage
## S3 method for class 'textmodel_newsmap_accuracy'
summary(object, ...)
Arguments
object |
output of accuracy() |
... |
not used. |
Semi-supervised Bayesian multinomial model for geographical document classification
Description
Train a Newsmap model to predict geographical focus of documents with labels given by a dictionary.
Usage
textmodel_newsmap(
x,
y,
label = c("all", "max"),
smooth = 1,
boolean = FALSE,
drop_label = TRUE,
verbose = quanteda_options("verbose"),
entropy = c("none", "global", "local", "average"),
...
)
Arguments
x |
a dfm or fcm created by |
y |
a dfm or a sparse matrix that record class membership of the
documents. It can be created applying |
label |
if "max", uses only labels for the maximum value in each row of
|
smooth |
a value added to the frequency of words to smooth likelihood ratios. |
boolean |
if |
drop_label |
if |
verbose |
if |
entropy |
[experimental] the scheme to compute the entropy to
regularize likelihood ratios. The entropy of features are computed over
labels if |
... |
additional arguments passed to internal functions. |
Details
Newsmap learns association between words and classes as likelihood
ratios based on the features in x
and the labels in y
. The large
likelihood ratios tend to concentrate to a small number of features but the
entropy of their frequencies over labels or documents helps to disperse the
distribution.
References
Kohei Watanabe. 2018. "Newsmap: semi-supervised approach to geographical news classification." Digital Journalism 6(3): 294-309.
Examples
require(quanteda)
text_en <- c(text1 = "This is an article about Ireland.",
text2 = "The South Korean prime minister was re-elected.")
toks_en <- tokens(text_en)
label_toks_en <- tokens_lookup(toks_en, data_dictionary_newsmap_en, levels = 3)
label_dfm_en <- dfm(label_toks_en)
feat_dfm_en <- dfm(toks_en, tolower = FALSE)
model_en <- textmodel_newsmap(feat_dfm_en, label_dfm_en)
predict(model_en)