[R] scanning a pdf scan

roger koenker rkoenker at uiuc.edu
Fri Oct 27 21:42:33 CEST 2006


Thanks for your suggestions.  Trial and error experimentation
with adobe acrobat produced the following method:

It looks like it is possible to highlight the numerical part of the
table in Acrobat and then copy/paste into a text file, with about
98 percent accuracy.  Wonders never cease.


url:    www.econ.uiuc.edu/~roger            Roger Koenker
email    rkoenker at uiuc.edu            Department of Economics
vox:     217-333-4558                University of Illinois
fax:       217-244-6678                Champaign, IL 61820


On Oct 27, 2006, at 11:52 AM, Gabor Grothendieck wrote:

> I don't have specific experience with this but strapply
> of package gsubfn can extract information from a string by content
> as opposed to delimiters. e.g.
>
>> library(gsubfn)
>> strapply("abc34def56xyz", "[0-9]+", c)[[1]]
> [1] "34" "56"
>
> On 10/27/06, roger koenker <rkoenker at uiuc.edu> wrote:
>> I have a pdf scan of several pages of data from a quite famous old
>> paper by
>> C.S. Pierce (1873).  I would like (what else?) to convert it into an
>> R dataframe.
>> Somewhat to my surprise the pdf seems to already be in a character
>> recognized
>> form, since I can search for numerical strings and they are nicely
>> found.  Of
>> course, as is usual with such tables there are also headings and
>> column lines, etc
>> etc. that are less interesting than the numbers themselves.  I've
>> tried saving the
>> pdf in various formats, some of which look vaguely tractable, but I'm
>> hoping
>> that there is something that is more automatic.
>>
>> Does anyone have experience that they could share toward this  
>> objective?
>>
>>
>> url:    www.econ.uiuc.edu/~roger            Roger Koenker
>> email    rkoenker at uiuc.edu            Department of Economics
>> vox:     217-333-4558                University of Illinois
>> fax:       217-244-6678                Champaign, IL 61820
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting- 
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>



More information about the R-help mailing list