[R] Getting data from a PDF-file into R

joe1985 johannes at dsr.life.ku.dk
Tue Jan 27 08:30:15 CET 2009





Peter Dalgaard wrote:
> 
> joe1985 wrote:
>> Hello
>> 
>> I have around 200 PDF-documents, containing data i want organized in R as
>> a
>> dataframe. The PDF-documents look like this;
>> 
>>   http://www.nabble.com/file/p21667074/PRRS-billede%2Bmed%2Bfarver.jpeg 
>> 
>> or like this;
>> 
>> http://www.nabble.com/file/p21667074/PRRS-billede%2Bmed%2Bfarver%2B2.jpeg 
>> 
>> So i want to pull out the data in coloured boxes it become organized like
>> this (just in R instead of excel);
>> 
>> 
>> http://www.nabble.com/file/p21667074/PRRS-billede%2Bexcel.jpeg 
>> 
>> So the 0'es and 1'es represent when either "PRRS-neg" occurs presented by
>> a
>> 0 in the colums PRRS-VAC and PRRS-DK on a particular date. And the same
>> with
>> "PRRS-pos VAC" or "Vac" presented by a 1 in the colum PRRS-VAC, and
>> "PRRS-pos DK"  or "DK" presented by a 1 in the colum PRRS-DK. And also
>> with
>> "sanVAC" there should be a 1 in the colum VACsan, and with "sanDK" there
>> should be a 1 in the colum DKsan. The first date for each "CHR-nr" should
>> either be the earliest date ne the red box (as in the first picture), or
>> the
>> date with word "før" before the date (as in the second picture). All the
>> 200
>> PDF-documents looks like the ones in the pictures, each reprenting a
>> different "CHR-nr"
>> 
>> 
>> Hope you can help me
> 
> Not on the basis of .jpeg files, I think. We'd need some indication of
> what the PDF looks like inside.  There's a tool called pdftotext, which
> might do something for you, IF you can figure out reliably where your
> data begin and end.
> 
> -- 
>    O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
>   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> Thank you for your quick respons
> 
> Here they are as textfiles;
> 
> 
> 
> 
http://www.nabble.com/file/p21680833/Foersom%2B-%2B688.txt Foersom+-+688.txt 

http://www.nabble.com/file/p21680833/M%25C3%2598LLEVANG%2B602%2B.txt
M%C3%98LLEVANG+602+.txt 
-- 
View this message in context: http://www.nabble.com/Getting-data-from-a-PDF-file-into-R-tp21667074p21680833.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list