[R] DocumentTermMatrix error

Matevž Pavlič matevz.pavlic at gi-zrmk.si
Sat May 21 13:26:40 CEST 2011


Hi all, 

 

I have tried to create  a DocumentTermMatrix with a tm package, but i get this error :

 

Error in tolower(txt) : 

  invalid input 'PROD Z LAHKO GNETNO MELJNO GLINO, ... in 'utf8towcs'

 

I tried doing this as it is showed in :

http://www.r-project.org/doc/Rnews/Rnews_2008-2.pdf (An Introduction to Text Mining),

 

with this R code :

 

setwd("C:/Users/mpavlic/Desktop/temp")

tekst <- Corpus(DirSource("."))

>Warning message:

>In readLines(y, encoding = x$Encoding) :

>incomplete final line found on './test.txt'

 

meta(tekst, "Heading", "local") <- c("test")

meta(tekst[[1]])

>Available meta data pairs are:

  Author       : 

   DateTimeStamp: 2011-05-21 11:25:21

   Description  : 

   Heading      : test

  ID           : test.txt

  Language     : en

  Origin       :

 

test <- TermDocumentMatrix(tekst)

> Error in tolower(txt) : 

> invalid input 'PROD Z LAHKO GNETNO MELJNO GLINO, ... in 'utf8towcs'

 

 

Attached is a small sample (test.txt) on which i worked.

 

Any help would be appreaciated, 

m

 

 

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test.txt
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110521/fe77f990/attachment.txt>


More information about the R-help mailing list