[R] Reading table data from PDF files

jim holtman jholtman at gmail.com
Sat Feb 4 01:36:22 CET 2012


I think a lot would depend on exactly how the data is formatted.  I
have used 'pdf2text' converters (many freely available on the web) to
convert to text and then use R to read-in/preprocess the data to get
it into a format to process.

You can invoke these converter with the 'system' function and then
read the output file that is generated.  I would think that you would
have to have some custom code to then interpret the data in the text
file depending on how it was created.

So I am sure you can do it within R, with some auxiliary functions
that are called with 'system', without much trouble.

On Fri, Feb 3, 2012 at 4:11 PM, Bryan McCloskey <bmccloskey at usgs.gov> wrote:
> All,
>
> Is anyone familiar with a way to use R to read table data from a large collection of PDF files? I'm aware there are various command lines and desktop utilities that might be able to (e.g.,) dump PDFs to text, which could then be parsed for table data. But I'm hoping there is something more integrated that could be incorporated into R functions and scripts to handle large batches of PDFs in a more automated fashion.
>
> Has anyone used R to extract large amounts of tabular data from PDF documents?
>
> -bryan
>
> ------
> Bryan McCloskey, Ph.D.
> IT Specialist (Data Management/Internet)
> U.S. Geological Survey
> St. Petersburg Coastal & Marine Science Center
> 600 Fourth St. South
> St. Petersburg, FL 33701
>
> South Florida Information Access: http://sofia.usgs.gov
> Everglades Depth Estimation Network: http://sofia.usgs.gov/eden
> Phone: 727.803.8747x3017 * Fax: 727.803.2032
> ------
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.



More information about the R-help mailing list