[R] bayesian text classification...

Peter Dalgaard BSA p.dalgaard at biostat.ku.dk
Tue Jan 21 09:35:02 CET 2003


rossini at blindglobe.net (A.J. Rossini) writes:

> for Spam.
> 
> In the process of setting up a more effective spam filtering system, I
> just noticed that bogofilter, which implements extensions of the (a?) 
> "Naive Bayes" text classification approach, will dump out R data
> frames; the man page suggests how to "integrate" it with R for
> verification.  (sort of, that is).
> 
> Anyway, for those of you looking for silly and perhaps interesting
> problems/datasets for your engineering or comp-sci statistics classes,
> this one looks quite amusing...
> 
> Looks like Eric Raymond knows (about) R -- a script is apparently
> included in the source according to the man page, though I couldn't
> find it in the Debian package.

The text in http://www.bgl.nu/bogofilter/BcrFisher.html certainly has
one. It could be interesting to try and figure out what is actually
going on there - some of it certainly looks weird, and last time I
looked at "Naive Bayes" I got the impression that these people would
label anything returning a probability as "Bayesian"...  

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907




More information about the R-help mailing list