[R] TM reader with text

David Winsemius dwinsemius at comcast.net
Thu Mar 1 00:49:20 CET 2012

On Feb 29, 2012, at 6:00 PM, Mickael R problem wrote:

> Hello everybody,
> I work, I try, with TM but I have a problem with some special words in
> french. I think this is due to the manner to transform PDF to text,  
> but I'm
> not perfectly sure.
> Let's see to the example :
> findFreqTerms(tdm1,30)
>    [33] "<U+F0A3>"            "<U+FB01>n"           "<U 
> +FB01>nancement"
> "<U+FB01>nancier"     "<U+FB01>nancière"    "<U+FB01>nancières"
> "<U+FB01>nanciers"    "<U+FB01>xe"
> Some french words are not well reading by TM with the reader  
> readPlain. I
> try to use reader= reader PDF. But it doesn't work so I must  
> transformed PDF
> text to text. And some words are not understand so when I use
> TermDocumentMatrix a word like inflation diseappear. It's a big  
> probleme for
> me. I spend lot of time on this problem, any idea ? Thank's for you  
> time.

You included no information about your platform, locale settings, or  
encoding of the text.



David Winsemius, MD
West Hartford, CT

More information about the R-help mailing list