[R] word stemming for corpus linguistics

Tue Jul 26 10:13:51 CEST 2016

Hi Paul

I have seen this - it's part of the tm package mentioned originally. So, 
I've tried it again and perhaps I'm using stemDocument incorrectly, but 
this is what I am doing:

# > library(tm)
Loading required package: NLP
 > text.v <- scan(file.choose(), what = 'char', sep = '\n')
Read 938 items
# >text.stem.v <- stemDocument(text.v, language = 'english')

But it isn't changing anything in the body of the text I'm passing to it 
- the words are unlemmatized/ unstemmed.

When I try using SnowballC, the error returned is that tm_map doesn't 
have a method to work with objects of class 'character'.

Again, the problem is that tm doesn't seem to allow for concordance 
analysis ... or perhaps it does and I just haven't figured out how to do 
it, so am happy to be shown some documentation on that process, and 
whether that is applied before or after the text is transformed into a 
DTM because searching on-line hasn't (yet) thrown anything back.

Thanks.
Andy

On 26/07/16 08:50, Paul Johnston wrote:
> Suggest look at http://www.inside-r.org/packages/cran/tm/docs/stemDocument
>
>
>
> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Andy Wolfe
> Sent: 26 July 2016 08:10
> To: r-help at r-project.org
> Subject: [R] word stemming for corpus linguistics
>
> Hi list
>
> On a piece of work I'm doing in corpus linguistics, using a combo of texts by Gries "Quantitative Corpus Linguistics with R: A Practical Introduction" and Jockers "Text Analysis with R for Students of Literature", which are both really excellent by the way, I want to stem or lemmatize the words so that, for e.g., 'facilitating', 'facilitated', and 'facilitates' all become 'facilit'.
>
> In text mining, using a combination of the packages 'tm' and 'SnowballC'
> this is feasible, but then I am finding that working with the DTM (document term matrix) becomes difficult for when I want to do concordance (or key word in context) analysis.
>
> So, two questions:
>
> (1) is there a package for R version 3.3.1 that can work with corpus linguistics? and/ or
>
> (2) is there a way of doing concordance analysis using the tm package as part of the whole text mining process?
>
> I appreciate any help. Thanks.
>
> Andy
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>