[R] twitteR and wordcloud()

Doran, Harold HDoran at air.org
Thu Mar 12 00:47:19 CET 2015


I am trying to replicate the twitter and word cloud example found here

https://sites.google.com/site/miningtwitter/questions/talking-about/wordclouds/wordcloud1

When implemented verbatim, I replicate results and all works fine. But, when I make a slight modification to the code it fails in creating the tdm matrix. I found only one other question on this same topic at stack overflow with no answer leading to a solution.

Here is my code for a reproducible example, though you would need the twitteR tokens etc to run this on your own.

Any idea why the tdm step fails?

library(twitteR)
library(tm)
library(wordcloud)
library(RColorBrewer)

mach_tweets = searchTwitter("#machine", n=50, lang="en")
mach_text = sapply(mach_tweets, function(x) x$getText())
mach_corpus = Corpus(VectorSource(mach_text))

# create document term matrix applying some transformations
tdm = TermDocumentMatrix(mach_corpus,
   control = list(removePunctuation = TRUE,
   #stopwords = c(stopwords()),
   removeNumbers = TRUE, tolower = TRUE))

   # define tdm as matrix
m = as.matrix(tdm)
# get word counts in decreasing order
word_freqs = sort(rowSums(m), decreasing=TRUE)
# create a data frame with words and their frequencies
dm = data.frame(word=names(word_freqs), freq=word_freqs)

wordcloud(dm$word, dm$freq, random.order=FALSE, colors=brewer.pal(8, "Dark2”))



In fact, earlier today on a different computer than I am working on now, I wrote the following function and it works perfectly

tweets <- function(string, n, min){
                tweets <- searchTwitter(as.character(string), n=n)
                tweets_text <- sapply(tweets, function(x) x$getText())
                tweets_text_corpus <- Corpus(VectorSource(tweets_text))
                tweets_text_corpus <- tm_map(tweets_text_corpus, removePunctuation)
                tweets_text_corpus <- tm_map(tweets_text_corpus, function(x)removeWords(x,stopwords()))
                #wordcloud(tweets_text_corpus)
                myDtm <- TermDocumentMatrix(tweets_text_corpus, control = list(minWordLength = 1))
                m <- as.matrix(myDtm)
                v <- sort(rowSums(m), decreasing=TRUE)
                #wordcloud(names(v), v, scale = c(4,2), min.freq= min )
                v
                }

v <- tweets('#beer', n= 20)

But, when I run it on my Mac at home it also fails at the tdm step.

	[[alternative HTML version deleted]]



More information about the R-help mailing list