[R] Problem with Snowball & RWeka

flobede flobede at hotmail.com
Thu Jun 2 20:49:51 CEST 2011


Greetings to all,
I have a similar issue with Snowball.
I am runing R version 2.12.1 (2010-12-16) on windows 7

Here is my script : 
----
library(tm)

custom.xml  <-  system.file("texts",  "custom.xml",  package  =  "tm")
print(readLines(custom.xml),  quote  =  FALSE)

myXMLReader <- readXML(
  spec = list(
    Language = list("node", "/document/language"),
    DateTimeStamp = list("node", "/document/date"),
    Origin = list("node", "/document/source"),
    Description = list("node", "/document/subject"),
    Type = list("node", "/document/country"),
	  Heading = list("node", "/document/title"),
    Content = list("node", "/document/contenu"),
    Author = list("node", "/document/author")),
doc = PlainTextDocument())

mySource  <-  function(x,  encoding  =  "UTF-8")
  XMLSource(x,  function(tree)  XML::xmlRoot(tree)$children,  myXMLReader, 
encoding)

corpusmf  <-  Corpus(mySource(custom.xml))
meta(corpusmf[[1]])
meta(corpusmf[[2]])

corpusmf <- tm_map(corpusmf, stripWhitespace)
corpusmf <- tm_map(corpusmf, removeNumbers)
corpusmf <- tm_map(corpusmf, removePunctuation)
corpusmf <- tm_map(corpusmf,stemDocument)

matrix <- TermDocumentMatrix(corpusmf,control=list(weighting =weightBin ))
print(matrix)
 
-----
stemDocument returns an error message :
Stemmer 'porter' unknown!
Stemmer 'english' unknown!
Stemmer 'porter' unknown!
Stemmer 'english' unknown!

I tried to invoke library(Snowball) before, but it's the same.
I found a clue on Weka website
http://weka.wikispaces.com/The+snowball+stemmers+don%27t+work,+what+am+I+doing+wrong%3F
but I don't understand what I should do with this archives
I would be grateful if someone could help on this;
Kind regards,

--
View this message in context: http://r.789695.n4.nabble.com/Problem-with-Snowball-RWeka-tp3402126p3569089.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list