[R] Why is DocumentTermMatrix showing 0 term?

Patrick Casimir patrcasi at nova.edu
Tue Dec 6 16:25:33 CET 2016


docs has 4 documents and inspect(docs) shows 4 plaintextdocument


> summary(docs)
          Length Class             Mode
case1.txt 2      PlainTextDocument list
case2.txt 2      PlainTextDocument list
case3.txt 2      PlainTextDocument list
case4.txt 2      PlainTextDocument list


> inspect(docs)
<<VCorpus>>
Metadata:  corpus specific: 0, document level (indexed): 0
Content:  documents: 4

[[1]]
<<PlainTextDocument>>
Metadata:  7
Content:  chars: 4564

[[2]]
<<PlainTextDocument>>
Metadata:  7
Content:  chars: 9312

[[3]]
<<PlainTextDocument>>
Metadata:  7
Content:  chars: 1388

[[4]]
<<PlainTextDocument>>
Metadata:  7
Content:  chars: 2366


________________________________
From: Ista Zahn <istazahn at gmail.com>
Sent: Tuesday, December 6, 2016 10:08:28 AM
To: Patrick Casimir
Cc: r-help at r-project.org
Subject: Re: [R] Why is DocumentTermMatrix showing 0 term?

What is in docs?

What does

inspect(docs)

say?

--Ista



On Tue, Dec 6, 2016 at 9:29 AM, Patrick Casimir <patrcasi at nova.edu> wrote:
> Thanks Ista. See codes below. I am not sure why the DTM is showing 0 term. I
> have 4 documents in the corpus. And I was able to make transformations
>
> to the documents inside the corpus.
>
>
>> cname <- file.path("C:\\Users\\Desktop\\Text Mining\\Cases\\MyCorpus")
>> dir(cname)
> [1] "case1.txt" "case2.txt" "case3.txt" "case4.txt"
>> library(tm)
>> docs <- Corpus(DirSource(cname))
>> install.packages("magrittr" ,dependencies=TRUE)
>> viewDocs <- function(d, n) {d %>% extract2(n) %>% as.character() %>%
>> writeLines()}
>> viewDocs(docs, 1)
>> toSpace <- content_transformer(function(x, pattern) gsub(pattern, " ", x))
>> docs <- tm_map(docs, toSpace, "/|@|nn|")
>> inspect(docs[1])
>> docs <- tm_map(docs, removePunctuation)
>> docs <- tm_map(docs, removeWords, stopwords("english"))
>> inspect(docs[1])
>> docs <- tm_map(docs, stripWhitespace)
>> docs <- tm_map(docs, stemDocument)
>> dtm <- DocumentTermMatrix(docs)
>> dtm
> <<DocumentTermMatrix (documents: 4, terms: 0)>>
> Non-/sparse entries: 0/0
> Sparsity           : 100%
> Maximal term length: 0
> Weighting          : term frequency (tf)
>>
>
>
>
>
> ________________________________
> From: Ista Zahn <istazahn at gmail.com>
> Sent: Tuesday, December 6, 2016 9:09:37 AM
> To: Patrick Casimir
> Cc: r-help at r-project.org
> Subject: Re: [R] Why is DocumentTermMatrix showing 0 term?
>
>
> Hi Patrick,
>
> How could anyone possibly answer this question with only the information
> you've provided? It's like showing me an empty cup and asking why it's
> empty. Maybe you didn't put anything in it. Maybe you did and then you dog
> drank it or your cat knocked it over or your girlfriend drank it. How would
> I possibly know?
>
> Bottom line, you need to show exactly what you did to produce that result,
> preferably in the form of a few lines of code that we can run to reproduce
> your problem.
>
> Finally, you may find it helpful take some time to learn how to ask
> questions the smart way. http://catb.org/~esr/faqs/smart-questions.html is a
> good place to learn this important skill.
>
> Best,
> Ista
>
>
> On Dec 6, 2016 7:58 AM, "Patrick Casimir" <patrcasi at nova.edu> wrote:
>
> <<DocumentTermMatrix (documents: 4, terms: 0)>>
> Non-/sparse entries: 0/0
> Sparsity           : 100%
> Maximal term length: 0
> Weighting          : term frequency (tf)
>
>
>
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list