[R] find similar words in text

Patrick Casimir patrcasi at nova.edu
Fri Aug 4 01:38:47 CEST 2017


Use tm package and create a corpus to capture terms from the TDM within the corpus. Then you can apply as.matrix() to display terms' occurences. Go to CRAN and read about tm package.
________________________________
From: R-help <r-help-bounces at r-project.org> on behalf of Boris Steipe <boris.steipe at utoronto.ca>
Sent: Thursday, August 3, 2017 6:40:09 PM
To: Riaan Van Der Walt
Cc: R lists
Subject: Re: [R] find similar words in text

Please keep messages on the list so others can pitch in.

_Which_ words do you want to consider identical for the purpose of frequency count?
_What_ do you want to plot?



B.



> On Aug 3, 2017, at 4:36 PM, Riaan Van Der Walt <Riaan.VanDerWalt at nwu.ac.za> wrote:
>
> Hallo Boris,
> I've loaded the Rstem, Snowball.
> But I am clueless how to get a list eg. whal* (whale, whales, whaling, whaler, whalers, whaleman, whalemen, whale-ship, whale-boat, whale's)
> in the book Moby Dick and the frequency of each of the different words.
> I'am usig this script:
>
> whales1.v <- grep("^whal.*", moby.word.v)
> whales1.v
>
> The total occurrence for whal* is 1699.
> But I can't display it or plot it.
>
> I am new to R and the learning curve is steep!!
>
> Thx!
> Riaan
>
>
> Riaan van der Walt
> Tel / Phone / Mogala : 27+72+2172429
> Email / Epos / Emeile: riaan.vanderwalt at nwu.ac.za
> Url: http://www.nwu.ac.za/
>
> >>> Boris Steipe <boris.steipe at utoronto.ca> 31 Jul 2017 23:37 >>>
> You need a stemming algorithm. See here:
>   https://cran.r-project.org/web/views/NaturalLanguageProcessing.html
>
> Myself, I've had good experience with Rstem.
>
> B.
>
>
>
>
>
> > On Jul 31, 2017, at 4:47 PM, Riaan Van Der Walt <Riaan.VanDerWalt at nwu.ac.za> wrote:
> >
> > I am new to R.
> > Busy with Text Analysis.
> >
> > Need a script to find e.g
> >
> > whale, whales, whale's, whaler, whalers, whaling,... in Moby Dick
> >
> > Riaan
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> <Riaan Van Der Walt.vcf>

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]



More information about the R-help mailing list