[R] Using a text file as a removeWord dictionary in tm_map

jim holtman jholtman at gmail.com
Sun Mar 1 22:13:23 CET 2015


The 'read.table' was creating a data.frame (not a vector) and applying
'c' to it converted it to a list.  You should alway look at the object
you are creating.  You probably want to use 'scan'.

======================
> testFile <- "Although,this,query,applies,specifically,to,the,tm,package"
> # read in with read.table create a data.frame
> df_words <- read.table(text = testFile, sep = ',')
> df_words  # not a vector
        V1   V2    V3      V4           V5 V6  V7 V8      V9
1 Although this query applies specifically to the tm package
> c(df_words)  # this results in a list
$V1
[1] Although
Levels: Although
$V2
[1] this
Levels: this
$V3
[1] query
Levels: query
$V4
[1] applies
Levels: applies
$V5
[1] specifically
Levels: specifically
$V6
[1] to
Levels: to
$V7
[1] the
Levels: the
$V8
[1] tm
Levels: tm
$V9
[1] package
Levels: package
>
> # now read with 'scan'
> scan_words <- scan(text = testFile, what = '', sep = ',')
Read 9 items
> scan_words
[1] "Although"     "this"         "query"        "applies"
"specifically" "to"
[7] "the"          "tm"           "package"
>
>

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Sat, Feb 28, 2015 at 8:46 AM, Sun Shine <phaedrusv at gmail.com> wrote:
> Hi list
>
> Although this query applies specifically to the tm package, perhaps it's
> something that others might be able to lend a thought to.
>
> Using tm to do some initial text mining, I want to include an external (to
> R) generated dictionary of words that I want removed from the corpus.
>
> I have created a comma separated list of terms in " " marks in a
> stopList.txt plain UTF-8 file. I want to read this into R, so do:
>
>> stopDict <- read.table('~/path/to/file/stopList.txt', sep=',')
>
> When I want to load it as part of the removeWords function in tm, I do:
>
>> docs <- tm_map(docs, removeWords, stopDict)
>
> which has no effect. Neither does:
>
>> docs <- tm_map(docs, removeWords, c(stopDict))
>
> What am I not seeing/ doing?
>
> How do I pass a text file with pre-defined terms to the removeWords
> transform of tm?
>
> Thanks for any ideas.
>
> Cheers
>
> Sun
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list