[R] Why is DocumentTermMatrix showing 0 term?

Patrick Casimir patrcasi at nova.edu
Tue Dec 6 21:00:25 CET 2016



Will do.

________________________________
From: Ista Zahn <istazahn at gmail.com>
Sent: Tuesday, December 6, 2016 2:49:30 PM
To: Patrick Casimir
Cc: r-help at r-project.org
Subject: Re: [R] Why is DocumentTermMatrix showing 0 term?

On Tue, Dec 6, 2016 at 2:28 PM, Patrick Casimir <patrcasi at nova.edu> wrote:
> Actually, the DTM works now. This is amazing.  Million thanks. Why wasn't it
> working before?

Do as I suggested and

start adding back your tm_map's until you find the thing that
breaks it.

--Ista

>
> See below:
>
>
>> cname <- file.path("C:\\Users\\Desktop\\Text Mining\\Cases\\MyCorpus")
>> docs <- Corpus(DirSource(cname))
>> dtm <- DocumentTermMatrix(docs)
>> dtm
> <<DocumentTermMatrix (documents: 4, terms: 766)>>
> Non-/sparse entries: 920/2144
> Sparsity           : 70%
> Maximal term length: 29
> Weighting          : term frequency (tf)
>
>
>
> ________________________________
> From: Ista Zahn <istazahn at gmail.com>
> Sent: Tuesday, December 6, 2016 12:20:57 PM
>
> To: Patrick Casimir
> Cc: r-help at r-project.org
> Subject: Re: [R] Why is DocumentTermMatrix showing 0 term?
>
> Does
>
> cname <- file.path("C:\\Users\\Desktop\\Text Mining\\Cases\\MyCorpus")
> docs <- Corpus(DirSource(cname))
> dtm <- DocumentTermMatrix(docs)
> dtm
>
> work?
>
> If so, add start adding back your tm_map until you find the thing that
> breaks it.
>
> Best,
> Ista
>
> On Tue, Dec 6, 2016 at 10:25 AM, Patrick Casimir <patrcasi at nova.edu> wrote:
>>
>> docs has 4 documents and inspect(docs) shows 4 plaintextdocument
>>
>>
>>> summary(docs)
>>           Length Class             Mode
>> case1.txt 2      PlainTextDocument list
>> case2.txt 2      PlainTextDocument list
>> case3.txt 2      PlainTextDocument list
>> case4.txt 2      PlainTextDocument list
>>
>>> inspect(docs)
>> <<VCorpus>>
>> Metadata:  corpus specific: 0, document level (indexed): 0
>> Content:  documents: 4
>>
>> [[1]]
>> <<PlainTextDocument>>
>> Metadata:  7
>> Content:  chars: 4564
>>
>> [[2]]
>> <<PlainTextDocument>>
>> Metadata:  7
>> Content:  chars: 9312
>>
>> [[3]]
>> <<PlainTextDocument>>
>> Metadata:  7
>> Content:  chars: 1388
>>
>> [[4]]
>> <<PlainTextDocument>>
>> Metadata:  7
>> Content:  chars: 2366
>>
>>
>>
>> ________________________________
>> From: Ista Zahn <istazahn at gmail.com>
>> Sent: Tuesday, December 6, 2016 10:08:28 AM
>>
>> To: Patrick Casimir
>> Cc: r-help at r-project.org
>> Subject: Re: [R] Why is DocumentTermMatrix showing 0 term?
>>
>> What is in docs?
>>
>> What does
>>
>> inspect(docs)
>>
>> say?
>>
>> --Ista
>>
>>
>>
>> On Tue, Dec 6, 2016 at 9:29 AM, Patrick Casimir <patrcasi at nova.edu> wrote:
>>> Thanks Ista. See codes below. I am not sure why the DTM is showing 0
>>> term.
>>> I
>>> have 4 documents in the corpus. And I was able to make transformations
>>>
>>> to the documents inside the corpus.
>>>
>>>
>>>> cname <- file.path("C:\\Users\\Desktop\\Text Mining\\Cases\\MyCorpus")
>>>> dir(cname)
>>> [1] "case1.txt" "case2.txt" "case3.txt" "case4.txt"
>>>> library(tm)
>>>> docs <- Corpus(DirSource(cname))
>>>> install.packages("magrittr" ,dependencies=TRUE)
>>>> viewDocs <- function(d, n) {d %>% extract2(n) %>% as.character() %>%
>>>> writeLines()}
>>>> viewDocs(docs, 1)
>>>> toSpace <- content_transformer(function(x, pattern) gsub(pattern, " ",
>>>> x))
>>>> docs <- tm_map(docs, toSpace, "/|@|nn|")
>>>> inspect(docs[1])
>>>> docs <- tm_map(docs, removePunctuation)
>>>> docs <- tm_map(docs, removeWords, stopwords("english"))
>>>> inspect(docs[1])
>>>> docs <- tm_map(docs, stripWhitespace)
>>>> docs <- tm_map(docs, stemDocument)
>>>> dtm <- DocumentTermMatrix(docs)
>>>> dtm
>>> <<DocumentTermMatrix (documents: 4, terms: 0)>>
>>> Non-/sparse entries: 0/0
>>> Sparsity           : 100%
>>> Maximal term length: 0
>>> Weighting          : term frequency (tf)
>>>>
>>>
>>>
>>>
>>>
>>> ________________________________
>>> From: Ista Zahn <istazahn at gmail.com>
>>> Sent: Tuesday, December 6, 2016 9:09:37 AM
>>> To: Patrick Casimir
>>> Cc: r-help at r-project.org
>>> Subject: Re: [R] Why is DocumentTermMatrix showing 0 term?
>>>
>>>
>>> Hi Patrick,
>>>
>>> How could anyone possibly answer this question with only the information
>>> you've provided? It's like showing me an empty cup and asking why it's
>>> empty. Maybe you didn't put anything in it. Maybe you did and then you
>>> dog
>>> drank it or your cat knocked it over or your girlfriend drank it. How
>>> would
>>> I possibly know?
>>>
>>> Bottom line, you need to show exactly what you did to produce that
>>> result,
>>> preferably in the form of a few lines of code that we can run to
>>> reproduce
>>> your problem.
>>>
>>> Finally, you may find it helpful take some time to learn how to ask
>>> questions the smart way. http://catb.org/~esr/faqs/smart-questions.html
>>> is
>>> a
>>> good place to learn this important skill.
>>>
>>> Best,
>>> Ista
>>>
>>>
>>> On Dec 6, 2016 7:58 AM, "Patrick Casimir" <patrcasi at nova.edu> wrote:
>>>
>>> <<DocumentTermMatrix (documents: 4, terms: 0)>>
>>> Non-/sparse entries: 0/0
>>> Sparsity           : 100%
>>> Maximal term length: 0
>>> Weighting          : term frequency (tf)
>>>
>>>
>>>
>>>
>>>
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>

	[[alternative HTML version deleted]]



More information about the R-help mailing list