[R] Mining non-english text

Loris Bennett loris.bennett at fu-berlin.de
Wed Mar 4 08:52:05 CET 2015


saikiran putta <putta.saikiran1994 at gmail.com> writes:

> I am new to R programming and trying to mine this pdf file
> http://164.100.180.82/Rollpdf/AC276/S24A276P001.pdf. This pdf file is in
> non-English language and I'm not able to figure out how to proceed. And,
> I'm not even sure how to extract information from a PDF file, so please
> help!
>
> 	[[alternative HTML version deleted]]
>

Nothing to do with R, but the command-line program pdftotxt might help
you to get going and is available for Linux and, apparently, for
Windows.  It can deal with various encodings.

Cheers,

Loris

-- 
This signature is currently under construction.



More information about the R-help mailing list