Type: Package
Title: Bayes Classifier for Verbal Autopsy Data
Version: 1.2
Date: 2022-04-25
Maintainer: Richard Wen <rrwen.dev@gmail.com>
Description: An implementation of the Naive Bayes Classifier (NBC) algorithm used for Verbal Autopsy (VA) built on code from Miasnikof et al (2015) <doi:10.1186/s12916-015-0521-2>.
Depends: R (≥ 4.0.0)
Imports: graphics, methods, utils, shiny
Suggests: bookdown, knitr, rmarkdown, testthat
Enhances: openVA
License: GPL-3
LazyData: TRUE
RoxygenNote: 7.1.2
Collate: 'nbc4va.R' 'nbc4va_data.R' 'nbc4va_validation.R' 'nbc4va_internal.R' 'nbc4va_main.R' 'nbc4va_extra.R' 'nbc4va_utility.R' 'nbc4va_wrapper.R'
VignetteBuilder: knitr
Encoding: UTF-8
NeedsCompilation: no
Packaged: 2022-05-09 12:05:12 UTC; rrwen
Author: Richard Wen [aut, cre], Pierre Miasnikof [ctb], Vasily Giannakeas [ctb], Mireille Gomes [ctb]
Repository: CRAN
Date/Publication: 2022-05-10 10:40:02 UTC

Calculate predicted CSMFs from a NBC model

Description

Obtains the predicted Cause Specific Mortality Fraction (CSMF) from a result nbc object.

Usage

csmf.nbc(object)

Arguments

object

The result nbc object.

Value

out A numeric vector of the predicted CSMFs in which the names are the corresponding causes.

See Also

Other wrapper functions: topCOD.nbc()

Examples

library(nbc4va)
data(nbc4vaData)

# Run naive bayes classifier on random train and test data
train <- nbc4vaData[1:50, ]
test <- nbc4vaData[51:100, ]
results <- nbc(train, test)

# Obtain the predicted CSMFs
predCSMF <- csmf.nbc(results)


Check arguments for nbc()

Description

Performs checks to ensure that the arguments passed to internalNBC are correct. This function will also auto-clean when appropriate, and display warning messages of the cleaning tasks.

Usage

internalCheckNBC(train, test, known = TRUE, assume = FALSE, unknown = 99)

Arguments

train

Dataframe of verbal autopsy train data (See Data documentation).

  • Columns (in order): ID, Cause, Symptom-1 to Symptom-n..

  • ID (vectorof char): unique case identifiers

  • Cause (vectorof char): observed causes for each case

  • Symptom-n.. (vectorsof (1 OR 0)): 1 for presence, 0 for absence, other values are treated as unknown

  • Unknown symptoms are imputed randomly from distributions of 1s and 0s per symptom column; if no 1s or 0s exist then the column is removed

Example:

ID Cause S1 S2 S3
"a1" "HIV" 1 0 0
"b2" "Stroke" 0 0 1
"c3" "HIV" 1 1 0
test

Dataframe of verbal autopsy test data in the same format as train except if causes are not known:

  • The 2nd column (Cause) can be omitted if known is FALSE

known

TRUE to indicate that the test causes are available in the 2nd column and FALSE to indicate that they are not known

assume

TRUE to set all symptoms not equal to 1 as 0 and FALSE to raise error if symptoms are not 0 or 1. This takes priority over unknown.

unknown

A single integer value which determines if a symptom is unknown as to if is present or absent.

  • The unknown values are substituted according to the proportion of the 1s and 0s per column

  • Setting this to NULL will ignore this substitution

  • All other values that are not the unknown value or 1 will be set to 0 after the substition

Details

The following checks are applied to train and test to ensure they:

Value

out A list object containing the checked inputs:

See Also

Other validation functions: internalCheckNBCSummary()

Examples

library(nbc4va)
data(nbc4vaData)

# Check train and test inputs, error if it does not pass check
train <- nbc4vaData[1:50, ]
test <- nbc4vaData[51:100, ]
checked <- nbc4va::internalCheckNBC(train, test)
train <- checked$train
test <- checked$test


Check arguments for summary.nbc()

Description

Performs checks to ensure that the arguments passed to summary.nbc are correct. This function will perform automatic data type conversions, and display warnings when appropriate.

Usage

internalCheckNBCSummary(object, top = 5, id = NULL, csmfa.obs = NULL, ...)

Arguments

object

The result nbc object.

top

A number that produces top causes depending on id:

  • If (id is char): provide the top causes of the case by probability

  • If (id is NULL): provide the top causes by predicted Cause Specific Mortality Fractions (CSMF)

id

A character representing a case id in the test data.

csmfa.obs

A character vector of the true causes for calculating the CSMF accuracy.

...

Additional arguments to be passed if applicable

Details

The following checks are applied:

Value

out A list object containing the checked inputs:

See Also

Other validation functions: internalCheckNBC()

Examples

library(nbc4va)
data(nbc4vaData)

# Create an nbc
train <- nbc4vaData[1:50, ]
test <- nbc4vaData[51:100, ]
results <- nbc(train, test)

# Check the inputs before passing on to summary
checked <- nbc4va::internalCheckNBCSummary(results, 5, "g85")
results <- checked$object
top <- checked$top
id <- checked$id
csmfa.obs <- checked$csmfa.obs


Calculate CSMF accuracy

Description

Calculates the overall CSMF accuracy given any number of predicted cases and any number of observed cases.

Usage

internalGetCSMFAcc(pred, obs)

Arguments

pred

Chracter vector of predicted causes for each case.

obs

Character vector of observed causes for each case.

Value

csmfa Numeric value of the overall CSMF accuracy (see Methods documentation).

See Also

Other internal functions: internalGetCSMFMaxError(), internalGetCauseMetrics(), internalGetMetrics(), internalNBC()

Examples

library(nbc4va)
pred <- c("HIV", "Stroke", "HIV", "Stroke")
obs <- c("HIV", "HIV", "Stroke", "Stroke")
csmfa <- nbc4va::internalGetCSMFAcc(pred, obs)


Calculate CSMF maximum error

Description

Calculates the CSMF maximum error given a set of observed cases.

Usage

internalGetCSMFMaxError(obs)

Arguments

obs

Character vector of observed causes for each case.

Value

csmfMaxError Numeric value of the CSMF maximum error (see Methods documentation).

See Also

Other internal functions: internalGetCSMFAcc(), internalGetCauseMetrics(), internalGetMetrics(), internalNBC()

Examples

library(nbc4va)
obs <- c("HIV", "HIV", "Stroke", "Stroke")
maxerror <- nbc4va::internalGetCSMFMaxError(obs)


Calculate performance metrics table per cause

Description

A table providing performance metrics per unique cause based on input predicted and observed cases.

Usage

internalGetCauseMetrics(pred, obs, causes = unique(c(pred, obs)))

Arguments

pred

Chracter vector of predicted causes for each case.

obs

Character vector of observed causes for each case.

causes

Character vector of all possible causes including ones that are not in the pred or obs.

Details

This code is built on the original performance metrics code provided by Dr. Mireille Gomes.

Value

out Dataframe of a performance metrics per cause (see Methods documentation):

Example:

Cause Sensitivity Metric-n..
HIV 0.5 #..
Stroke 0.5 #..

See Also

Other internal functions: internalGetCSMFAcc(), internalGetCSMFMaxError(), internalGetMetrics(), internalNBC()

Examples

library(nbc4va)
pred <- c("HIV", "Stroke", "HIV", "Stroke")
obs <- c("HIV", "HIV", "Stroke", "Stroke")
cmetrics <- nbc4va::internalGetCauseMetrics(pred, obs)


Calculate overall performance metrics

Description

A vector providing overall performance metrics based on input predicted and observed cases.

Usage

internalGetMetrics(
  pred,
  obs,
  causes = unique(c(pred, obs)),
  csmfa.obs = NULL,
  causeMetrics = internalGetCauseMetrics(pred, obs, causes)
)

Arguments

pred

Chracter vector of predicted causes for each case.

obs

Character vector of observed causes for each case.

causes

Character vector of all possible causes including ones that are not in the pred or obs.

csmfa.obs

A character vector of the true causes for calculating the CSMF accuracy.

causeMetrics

Dataframe of a performance metrics per cause (see internalGetCauseMetrics):

  • Columns: Cause, TruePositives, TrueNegatives, FalsePositives, FalseNegatives, PredictedFrequency, ObservedFrequency, Sensitivity, CSMFpredicted, CSMFobserved

  • Cause (vectorof char): The unique causes from both the obs and pred inputs

  • TruePositives (vectorof double): The total number of true positives per cause

  • TrueNegatives (vectorof double): The total number of true negatives per cause

  • FalsePositives (vectorof double): The total number of false positives per cause

  • FalseNegatives (vectorof double): The total number of false negatives per cause

  • PredictedFrequency (vectorof double): The occurence of a cause in the pred input

  • ObservedFrequency (vectorof double): The occurence of a cause in the obs input

  • Sensitivity (vectorof double): the sensitivity for a cause

  • CSMFpredicted (vectorof double): the cause specific mortality fraction for a cause given the predicted deaths

  • CSMFobserved (vectorof double): the cause specific mortality fraction for a cause given the observed deaths

Details

Developer Note: Depends on the internalGetCSMFAcc function to get the CSMF Accuracy.

Value

metrics Named numeric vector of performance metrics (see Methods documentation):

See Also

Other internal functions: internalGetCSMFAcc(), internalGetCSMFMaxError(), internalGetCauseMetrics(), internalNBC()

Examples

library(nbc4va)
pred <- c("HIV", "Stroke", "HIV", "Stroke")
obs <- c("HIV", "HIV", "Stroke", "Stroke")
metrics <- nbc4va::internalGetMetrics(pred, obs)


NBC algorithm source code

Description

Performs Naive Bayes Classification given train and test (validation) datasets, as well as additional information for the train and test data.

Usage

internalNBC(train, test, known = TRUE)

Arguments

train

Dataframe of verbal autopsy train data (See Data documentation).

  • Columns (in order): ID, Cause, Symptom-1 to Symptom-n..

  • ID (vectorof char): unique case identifiers

  • Cause (vectorof char): observed causes for each case

  • Symptom-n.. (vectorsof (1 OR 0)): 1 for presence, 0 for absence, other values are treated as unknown

  • Unknown symptoms are imputed randomly from distributions of 1s and 0s per symptom column; if no 1s or 0s exist then the column is removed

Example:

ID Cause S1 S2 S3
"a1" "HIV" 1 0 0
"b2" "Stroke" 0 0 1
"c3" "HIV" 1 1 0
test

Dataframe of verbal autopsy test data in the same format as train except if causes are not known:

  • The 2nd column (Cause) can be omitted if known is FALSE

known

TRUE to indicate that the test causes are available in the 2nd column and FALSE to indicate that they are not known

Details

This function was built on code provided by Miasnikof et al (2015). Edits to the code included the following improvements:

Value

out The result list object containing:

Author(s)

Pierre Miasnikof (Original), Vasily Giannakeas (Original), Richard Wen (Edits) <wenr@smh.ca>

References

See Also

Other internal functions: internalGetCSMFAcc(), internalGetCSMFMaxError(), internalGetCauseMetrics(), internalGetMetrics()

Examples

library(nbc4va)
data(nbc4vaData)

# Create naive bayes classifier on random train and test data
# Set "known" to indicate whether or not "test" causes are known
train <- nbc4vaData[1:50, ]
test <- nbc4vaData[51:100, ]
results <- nbc4va::internalNBC(train, test, known=TRUE)

# Obtain the probabilities and predictions
prob <- results$prob.causes
pred <- results$pred.causes


Round values to whole numbers while preserving the sum

Description

Rounds a vector of values to whole numbers while preserving the sum (rounded if it is not a whole number) using the largest remainder method (Gallagher, 1991).

Usage

internalRoundFixedSum(v, roundSum = round)

Arguments

v

A vector of values with decimal values and a whole number sum to round.

roundSum

If the sum of the values in v is not a whole number, choose a rounding method to ensure it is a whole number.

Value

out A vector of v with the values rounded to whole numbers but with the whole number sum preserved.

References

See Also

Other data functions: internalSubAsRest()

Examples

library(nbc4va)
dec <- c(rep(50/2, 2), rep(50/3, 3))
whole <- nbc4va::internalRoundFixedSum(dec)


Substitute values in a dataframe proportionally to all other values

Description

Substitute a target value proportionally to the distribution of the rest of the values in a column, given the following conditions:

Usage

internalSubAsRest(
  dataset,
  x,
  cols = 1:ncol(dataset),
  ignore = c(NA, NaN),
  removal = FALSE
)

Arguments

dataset

A dataframe with value(s) of x in it.

x

A target value in dataframe to replace with the rest of values per column.

cols

A numeric vector of columns to consider for substitution.

ignore

A vector of the rest of the values to ignore for substitution.

removal

Set to TRUE to remove column(s) that consist only of x values.

Details

Pseudocode of algorithm:

  SET dataset = table of values with columns and rows
  SET x = target value for substitution

  IF x in dataset:
    FOR EACH column y in a dataset:
      SET xv = all x values in y
      SET rest = all values not equal to x in y
      IF xv == values in y:
        REMOVE y in dataset
      IF number of unique values of rest == 1:
        MODIFY xv = rest
      IF number of xv values < number of unique values of rest:
        SET xn = number of xv values
        MODIFY xv = random sample of rest with size xn
      ELSE:
        SET xn = number of xv values
        SET p = proportions of rest
        SET xnp = xn * p
        IF xnp has decimals:
          MODIFY xnp = round xnp such that sum(xnp) == xn via largest remainder method
        MODIFY xv = rest values with distribution of xnp
  RETURN dataset

Value

out A dataframe or list depending on removal:

See Also

Other data functions: internalRoundFixedSum()

Examples

library(nbc4va)
data(nbc4vaDataRaw)
unclean <- nbc4vaDataRaw
clean <- nbc4va::internalSubAsRest(unclean, 99)


Train a NBC model

Description

Performs supervised Naive Bayes Classification on verbal autopsy data.

Usage

nbc(train, test, known = TRUE)

Arguments

train

Dataframe of verbal autopsy train data (See Data documentation).

  • Columns (in order): ID, Cause, Symptom-1 to Symptom-n..

  • ID (vectorof char): unique case identifiers

  • Cause (vectorof char): observed causes for each case

  • Symptom-n.. (vectorsof (1 OR 0)): 1 for presence, 0 for absence, other values are treated as unknown

  • Unknown symptoms are imputed randomly from distributions of 1s and 0s per symptom column; if no 1s or 0s exist then the column is removed

Example:

ID Cause S1 S2 S3
"a1" "HIV" 1 0 0
"b2" "Stroke" 0 0 1
"c3" "HIV" 1 1 0
test

Dataframe of verbal autopsy test data in the same format as train except if causes are not known:

  • The 2nd column (Cause) can be omitted if known is FALSE

known

TRUE to indicate that the test causes are available in the 2nd column and FALSE to indicate that they are not known

Value

out The result nbc list object containing:

References

See Also

Other main functions: plot.nbc(), print.nbc_summary(), summary.nbc()

Examples

library(nbc4va)
data(nbc4vaData)

# Run naive bayes classifier on random train and test data
# Set "known" to indicate whether or not "test" causes are known
train <- nbc4vaData[1:50, ]
test <- nbc4vaData[51:100, ]
results <- nbc(train, test, known=TRUE)

# Obtain the probabilities and predictions
prob <- results$prob.causes
pred <- results$pred.causes


nbc4va: Bayes Classifier for Verbal Autopsy Data

Description

An implementation of the Naive Bayes Classifier (NBC) algorithm used for Verbal Autopsy (VA) built on code from Miasnikof et al (2015) <DOI:10.1186/s12916-015-0521-2>.

For documentation and help, please see:

https://rrwen.github.io/nbc4va/

Acknowledgements

This package was developed at the Centre for Global Health Research (CGHR) in Toronto, Ontario, Canada. The original NBC algorithm code was developed by Pierre Miaskinof and Vasily Giannakeas. The original performance metrics code was provided by Dr. Mireille Gomes whom also offered guidance in metrics implementation and user testing. Special thanks to Richard Zehang Li for providing a standard structure for the package and Patrycja Kolpak for user testing of the GUI.

Author(s)

Richard Wen <rrwen.dev@gmail.com>

References

Use citation("nbc4va") to view citation information for the nbc4va package.

Examples

## Not run: 
library(nbc4va)

# Quick start
# Follow the instructions in the web interface
nbc4vaGUI()

# View user guides for the nbc4va package
browseVignettes("nbc4va")

## End(Not run)


Example of clean data in nbc4va

Description

A random generation of clean verbal autopsy synthetic data for use in demonstrating the nbc4va package.

Usage

nbc4vaData

Format

A dataframe with 100 rows and 102 columns:

Source

Random generation using the sample function with set.seed set to 1.

Examples

library(nbc4va)
data(nbc4vaData)

Example of unclean data in nbc4va

Description

A random generation of unclean verbal autopsy synthetic data for use in demonstrating the nbc4va package.

Usage

nbc4vaDataRaw

Format

A dataframe with 100 rows and 102 columns:

Details

Warning: This data may produce errors depending on how you use it in the package.

Source

Random generation using the sample function with set.seed set to 1.

Examples

library(nbc4va)
data(nbc4vaDataRaw)

Web-based graphical user interface in nbc4va

Description

A Graphical User Interface (GUI) for the nbc4va package using shiny.

nbcguiex.png

Usage

nbc4vaGUI()

Details

This function requires the shiny package, which can be installed via:

install.packages("shiny")

Use esc in the R console to stop the GUI.

Please use a modern browser (e.g. latest firefox, chrome) for the best experience.

Value

Creates a GUI for running nbc4va in a web browser.

See Also

Other utility functions: nbc4vaIO()

Examples

## Not run: 
library(nbc4va)
nbc4vaGUI()

## End(Not run)


Run nbc4va using file input and output

Description

Runs nbc and uses summary.nbc on input data files or dataframes to output result files or dataframes with data on predictions, probabilities, causes, and performance metrics in an easily accessible way.

Usage

nbc4vaIO(
  trainFile,
  testFile,
  known = TRUE,
  csmfaFile = NULL,
  saveFiles = TRUE,
  outDir = dirname(testFile),
  fileHeader = strsplit(basename(testFile), "\\.")[[1]][[1]],
  fileReader = read.csv,
  fileReaderIn = "file",
  fileReaderArgs = list(as.is = TRUE),
  fileWriter = write.csv,
  fileWriterIn = "x",
  fileWriterOut = "file",
  fileWriterArgs = list(row.names = FALSE),
  outExt = "csv"
)

Arguments

trainFile

A character value of the path to the data to be used as the train argument for nbc or a dataframe of the train argument.

testFile

A character value of the path to the data to be used as the test argument for nbc or a dataframe of the test argument.

known

TRUE to indicate that the test causes are available in the 2nd column and FALSE to indicate that they are not known

csmfaFile

A character value of the path to the data to be used as the csmfa.obs argument for summary.nbc or a named vector of the csmfa.obs argument.

  • If (csmfaFile is char): the file must have only 1 column of the causes per case

saveFiles

Set to TRUE to save the return object as files or FALSE to return the actual object

outDir

A character value of the path to the directory to store the output results files.

fileHeader

A character value of the file header name to use for the output results files.

  • The default is to use the name of the testFile

fileReader

A function that is able to read the trainFile and the testFile.

  • The default is set to read csv files using read.csv

fileReaderIn

A character value of the fileReader argument name that accepts a file path for reading as an input.

fileReaderArgs

A list of the fileReader arguments to be called with do.call.

fileWriter

A function that is able to write data.frame objects to a file location.

  • The default is set to write csv files using write.csv

fileWriterIn

A character value of the fileWriter argument name that accepts a dataframe for writing.

fileWriterOut

A character value of the fileWriter argument name that accepts a file path for writing as an output.

fileWriterArgs

A list of arguments of the fileWriter arguments to be called with do.call.

outExt

A character value of the extension (without the period) to use for the result files.

  • The default is set to use the "csv" extension

  • The default is the directory of the testFile

Details

See Methods documentation for details on the methodology and implementation of the Naive Bayes Classifier algorithm. This function may also act as a wrapper for the main nbc4va package functions.

Value

out Vector or list of respective paths or data from the naive bayes classifier:

See Also

Other utility functions: nbc4vaGUI()

Examples

library(nbc4va)
data(nbc4vaData)

# Split data into train and test sets
train <- nbc4vaData[1:50, ]
test <- nbc4vaData[51:100, ]

# Save train and test data as csv in temp location
trainFile <- tempfile(fileext=".csv")
testFile <- tempfile(fileext=".csv")
write.csv(train, trainFile, row.names=FALSE)
write.csv(test, testFile, row.names=FALSE)

# Use nbc4vaIO via file input and output
# Set "known" to indicate whether test causes are known
outFiles <- nbc4vaIO(trainFile, testFile, known=TRUE)

# Use nbc4vaIO as a wrapper
out <- nbc4vaIO(train, test, known=TRUE, saveFiles=FALSE)


Translate open verbal autopsy arguments to train a NBC model

Description

A wrapper function for creating an nbc object with the parameters specified by the openVA package.

Usage

ova2nbc(symps.train, symps.test, causes.train, causes.table = NULL, ...)

Arguments

symps.train

Dataframe of verbal autopsy train data.

  • Columns (in order): ID, Cause, Symptom-1 to Symptom-n..

  • ID (vectorof char): case identifiers

  • Cause (vectorof char): observed causes for each case

  • Symptom-n.. (vectorsof char): "Y" for presence, "" for absence, "." for missing

Example:

ID Cause S1 S2 S3
"a1" "HIV" "Y" "" "."
"b2" "Stroke" "." "" "Y"
"c3" "HIV" "Y" "Y" "."
symps.test

Dataframe of verbal autopsy test data in the same format as symps.train.

  • If (causes.train is (vectorof char)): symps.test is assumed to not have a cause column

causes.train

The train vector or column for the causes of death to use.

  • If (vectorof char): cause of death values with number of values equal to nrow(symps.train); it is assumed that symps.test has no causes of death column

  • If (char): name of cause of death column from symps.train

causes.table

Character list of unique causes to learn.

  • If (NULL): set to unique causes of death in symps.train

...

Additional arguments to be passed to avoid errors if necessary.

Value

nbc An nbc object with the following modifications:

References

Examples

## Not run: 
library(openVA)  # install.packages("openVA")
library(nbc4va)

# Obtain some openVA formatted data
data(RandomVA3) # cols: deathId, cause, symptoms..
train <- RandomVA3[1:100, ]
test <- RandomVA3[101:200, ]

# Run naive bayes classifier on openVA data
results <- ova2nbc(train, test, "cause")

# Obtain the probabilities and predictions
prob <- results$prob.causes
pred <- results$pred.causes

## End(Not run)


Bar plot of top predicted causes from a NBC model

Description

Plots the results from a nbc object as a barplot for a number of causes based on predicted Cause Specific Mortality Fraction (CSMF).

plotnbcex.png

Usage

## S3 method for class 'nbc'
plot(
  x,
  top.plot = length(x$causes.pred),
  min.csmf = 0,
  csmfa.obs = NULL,
  footnote = TRUE,
  footnote.color = "gray48",
  footnote.size = 0.7,
  main = paste("Naive Bayes Classifier: Top ", top.plot, " Causes by Predicted CSMF",
    sep = ""),
  xlab = "Predicted CSMF",
  col = "dimgray",
  horiz = TRUE,
  border = NA,
  las = 1,
  ...
)

Arguments

x

A nbc object.

top.plot

A number that produces top k causes depending on a Cause Specific Mortality Fraction (CSMF) measure.

min.csmf

A number that represents the minimum CSMF measure for a cause to be included in the plot.

csmfa.obs

A character vector of the true causes for calculating the CSMF accuracy.

footnote

A boolean indicating whether to include a footnote containing details about the nbc or not.

footnote.color

A character specifying the color of the footnote text.

footnote.size

A numeric value specifying the size of the footnote text.

main

A character value of the title to display.

xlab

A character value of the x axis title.

col

A character value of the color to use for the plot.

horiz

Set to TRUE to draw bars horizontally and FALSE to draw bars vertically.

border

A character value of the colors to use for the bar borders. Set to NA to disable.

las

An integer value to determine if labels should be parallel or perpendicular to axis.

...

Additional arguments to be passed to barplot.

Details

See Methods documentation for details on CSMF and CSMF accuracy.

Value

Generates a bar plot the top predicted causes from the NBC model

See Also

barplot

Other main functions: nbc(), print.nbc_summary(), summary.nbc()

Examples

library(nbc4va)
data(nbc4vaData)

# Run naive bayes classifier on random train and test data
train <- nbc4vaData[1:50, ]
test <- nbc4vaData[51:100, ]
results <- nbc(train, test)

# Plot the top 3 causes by CSMF
plot(results, top.plot=3)


Print top predicted causes from a NBC model

Description

Prints a summary message from a summary.nbc object of the top causes by probability or predicted Cause Specific Mortality Fraction (CSMF).

printnbcex.png

Usage

## S3 method for class 'nbc_summary'
print(x, ...)

Arguments

x

A summary.nbc object.

...

Additional arguments to be passed if applicable.

Details

See Methods documentation for details on CSMF and probability from the Naive Bayes Classifier.

Value

Prints a summary of the top causes of death by probability for the NBC model.

See Also

Other main functions: nbc(), plot.nbc(), summary.nbc()

Examples

library(nbc4va)
data(nbc4vaData)

# Run naive bayes classifier on random train and test data
train <- nbc4vaData[1:50, ]
test <- nbc4vaData[51:100, ]
results <- nbc(train, test)

# Print a summary of all the test data for the top 3 causes by predicted CSMF
brief <- summary(results, top=3)
print(brief)


Summarize a NBC model with metrics

Description

Summarizes the results from a nbc object. The summary can be either for a particular case or for the entirety of cases.

Usage

## S3 method for class 'nbc'
summary(object, top = 5, id = NULL, csmfa.obs = NULL, ...)

Arguments

object

The result nbc object.

top

A number that produces top causes depending on id:

  • If (id is char): provide the top causes of the case by probability

  • If (id is NULL): provide the top causes by predicted Cause Specific Mortality Fractions (CSMF)

id

A character representing a case id in the test data.

csmfa.obs

A character vector of the true causes for calculating the CSMF accuracy.

...

Additional arguments to be passed if applicable

Details

See Methods documentation for details on calculations and metrics.

Value

out A summary object built from a nbc object with modifications/additions:

See Also

Other main functions: nbc(), plot.nbc(), print.nbc_summary()

Examples

library(nbc4va)
data(nbc4vaData)

# Run naive bayes classifier on random train and test data
train <- nbc4vaData[1:50, ]
test <- nbc4vaData[51:100, ]
results <- nbc(train, test)

# Obtain a summary for the results
brief <- summary(results, top=2)  # top 2 causes by CSMF for all test data
briefID <- summary(results, id="v48")  # top 5 causes by probability for case "v48"


Cause of death predictions from a NBC model

Description

Obtains the top causes of deaths for each testing case from a result nbc object.

Usage

topCOD.nbc(object)

Arguments

object

The result nbc object.

Value

out A dataframe of the top CODs:

See Also

Other wrapper functions: csmf.nbc()

Examples

library(nbc4va)
data(nbc4vaData)

# Run naive bayes classifier on random train and test data
train <- nbc4vaData[1:50, ]
test <- nbc4vaData[51:100, ]
results <- nbc(train, test)

# Obtain the top cause of death predictions for the test data
topPreds <- topCOD.nbc(results)