[R] Text Mining - Remove punctuation not removing quotes and dashes

Anindya Sankar Dey anindya55 at gmail.com
Mon Jun 8 08:54:53 CEST 2015


Hi,

I have been doing some text mining. I created the DTM matrix using the
following steps.

corpus1<-VCorpus(VectorSource(resume1$Dat1))

corpus1<-tm_map(corpus1,content_transformer(tolower))

dtm<-DocumentTermMatrix(corpus1,
                               control = list(removePunctuation = TRUE,
                                              removeNumbers = TRUE,
                                              removeSparseTerms=TRUE,
                                                stopwords = TRUE))


​After all the run I am still getting words like -quotation, "fun, model"​
, etc.

What can I do about it. I do not need this dahses and extra quotations.

-- 
Anindya Sankar Dey

	[[alternative HTML version deleted]]



More information about the R-help mailing list