[R] Reading pdf file into R

Marc Schwartz marc_schwartz at me.com
Mon Aug 27 21:03:23 CEST 2012


I was going to point you to the same utility, which I have used over the years on both Linux and OSX. A Google search using "pdf to text" will bring up a variety of non-R related possibilities.

It is possible that somebody, somewhere has built an interface in R to pdftotext, such as a wrapper function, whereby pdftotext is called via the use of system(). 

I don't see anything obvious on CRAN and if such a thing existed, it would make most sense to utilize the functionality of pdftotext, which is pretty mature, rather than develop something from scratch.

Of course, the use of pdftotext itself is predicated upon the source PDF not being a scanned image of a text page, in which case you would need an OCR based application.

Regards,

Marc Schwartz

On Aug 27, 2012, at 1:48 PM, Christofer Bogaso <bogaso.christofer at gmail.com> wrote:

> Thanks Berend for your reply. However I was expecting something may be
> available within R itself (or perhaps some added package.)
> 
> Thanks and regards,
> 
> On Tue, Aug 28, 2012 at 12:23 AM, Berend Hasselman <bhh at xs4all.nl> wrote:
>> 
>> On 27-08-2012, at 20:21, Christofer Bogaso wrote:
>> 
>>> Dear all, I have got a pdf file with lot of numerical data which I
>>> want to export to R for some analysis. Is there any way to doing that?
>>> 
>>> Thanks for your time.
>> 
>> 
>> Possibly the program pdftotext from xpdf tools (http://www.foolabs.com/xpdf/download.html) could help you.
>> 
>> Berend




More information about the R-help mailing list