[R] RStem with portuguese language

Paulo Cortez pcortez at dsi.uminho.pt
Mon Jul 28 17:59:36 CEST 2008


Greetings,

I have R 2.7.1 in MacOs and I believe UTF encoding is already installed. 
At least:

 > Sys.getenv()

shows several variables, including:
  LANG "pt_PT.UTF-8"

I installed the Rstem and tm packages and when I try the following code:

 > wordStem(c("aberração","aberrações"), language="portuguese")
[1] "aberraç\xc3" "aberraçõ"
Warning message:
In wordStem(c("aberração", "aberrações"), language = "portuguese") :
   Currently, only 'english' is tested. You will need support for UTF 
characters.

So my question is. Am I using Rstem wrong or I do not really have UTF 
support? What do I need to do?

Best regards,
-- 
Paulo Alexandre Ribeiro Cortez  (PhD, MSc)
Lecturer (Prof. Auxiliar) at the Department of Information Systems (DSI)
University of Minho, Campus de AzurÈm, 4800-058 Guimaraes, Portugal
http://www.dsi.uminho.pt/~pcortez +351253510313 Fax:+351253510300



More information about the R-help mailing list