[R] Naive Bayes Classifier

Huntsinger, Reid reid_huntsinger at merck.com
Thu May 17 20:15:54 CEST 2001


The "naive Bayes" classifier I've seen discussed in various machine-learning
papers and books is as described by David Meyer in his posting, except that
class (mixture component) membership is known in the training data. So it's
"supervised"--classes aren't "latent". The estimation is usually just via
"plug-in":

1. Compute marginal frequencies within class.

2. multiply these together as if variables (say x) were independent within
class to get an "estimate" of the class-conditional probabilities p(x | c)

3. via Bayes rule get the (x-) conditional probabilities over class
(posterior class probabilities) p(c | x). (Actually you don't need to divide
here since it's a common factor in the quantities to be compared to get the
classifier...)

4. To classify x find the class c maximizing p(c | x) (or minimizing the sum
of L(c,i)*p(i|x) over i if L(,) is a given loss function).

Often step 1 is replaced by Bayesian estimates of the marginal probabilities
to prevent 0 estimates and reduce variance. 

In case you don't find an R implementation I hope the above is helpful.

A final remark: while the expression for the posterior probabilities is the
same as for logistic regression (as Brian Ripley pointed out), the
estimation is different--even in large samples--when the model is incorrect
(as it is anticipated to be by the "naive" qualifier). Tom Mitchell's talk
at the SIAM Data Mining conference had an example of this, citing large
gains in performance by switching from the naive bayes approach to
maximizing the logistic regression likelihood.

Reid Huntsinger

-----Original Message-----
From: David Meyer [mailto:david.meyer at ci.tuwien.ac.at]
Sent: Thursday, May 17, 2001 5:32 AM
To: Murray Jorgensen
Cc: Ursula Sondhauss; r-help at stat.math.ethz.ch
Subject: Re: [R] Naive Bayes Classifier


Murray Jorgensen wrote:
> 
> As I understand Naive Bayes it is essentially a finite mixture model for
> multivariate categorical distributions where the variables are independent
in
> each component of the mixture. That is, I believe it to be a synonym
Latent
> Class analysis. I believe the Frayley/Raftery package mclust may include
this
> sort of model, and possibly other packages. Certainly these models may be
> expressed in the language of graphical models. Whether or not this would
be
> useful for estimation purposes I am uncertain.

You could also try lca() in package e1071.

-d

> 
> Murray Jorgensen
> 
> At 04:28 PM 16-05-01 +0100, Prof Brian Ripley wrote:
> >On Wed, 16 May 2001, Ursula Sondhauss wrote:
> >
> >> I am looking for an implementation of the Naive Bayes classifier for a
> >> multi-class classification problem. I can not even find the Naive Bayes
> >> classifier for two classes, though I can not believe it is not
> >> available. Can anyone help me?
> >
> >Hard to believe but likely true. However, as I understand this, it
applies
> >to a (K+1)-way contingency table, with K explanatory factors and and one
> >response.  And the `naive Bayes' model is a particular model for that
> >table.  If you want a classifier, you only need the conditional
> >distribution of the response given the explanatory factors, and that is a
> >main-effects-only multiple logistic model.  Now the *estimation*
> >procedures may be slightly different (`naive Bayes' is not fully
defined),
> >but if that does not matter, use multinom() in package nnet to fit this.
> >
> >A book on Graphical Modelling (e.g. Whittaker or Edwards) may help
> >elucidate the connections.
> >
> >Let me stress *as I understand this* here.
> >
> >--
> >Brian D. Ripley,                  ripley at stats.ox.ac.uk
> >Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> >University of Oxford,             Tel:  +44 1865 272861 (self)
> >1 South Parks Road,                     +44 1865 272860 (secr)
> >Oxford OX1 3TG, UK                Fax:  +44 1865 272595
> >
>
>-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> .-.-
> >r-help mailing list -- Read
http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> >Send "info", "help", or "[un]subscribe"
> >(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
>
>_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> ._._
> >
> Murray Jorgensen,  Department of Statistics,  U of Waikato, Hamilton, NZ
> -----[+64-7-838-4773]---------------------------[maj at waikato.ac.nz]-----
> "Doubt everything or believe everything:these are two equally convenient
> strategies. With either we dispense with the need to think."
> http://www.stats.waikato.ac.nz/Staff/maj.html          - Henri Poincare'
> 
>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-
> r-help mailing list -- Read
http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
>
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._

-- 
Mag. David Meyer			Wiedner Hauptstrasse 8-10
Vienna University of Technology		A-1040 Vienna/AUSTRIA
Department for Statistics, Probability	Tel.: (+431) 58801/10772
Theory and Actuarial Mathematics	mail: david.meyer at ci.tuwien.ac.at
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list