[R] Getting data from a PDF-file into R

joe1985 johannes at dsr.life.ku.dk
Mon Jan 26 16:11:48 CET 2009


Hello

I have around 200 PDF-documents, containing data i want organized in R as a
dataframe. The PDF-documents look like this;

  http://www.nabble.com/file/p21667074/PRRS-billede%2Bmed%2Bfarver.jpeg 

or like this;

http://www.nabble.com/file/p21667074/PRRS-billede%2Bmed%2Bfarver%2B2.jpeg 

So i want to pull out the data in coloured boxes it become organized like
this (just in R instead of excel);


http://www.nabble.com/file/p21667074/PRRS-billede%2Bexcel.jpeg 

So the 0'es and 1'es represent when either "PRRS-neg" occurs presented by a
0 in the colums PRRS-VAC and PRRS-DK on a particular date. And the same with
"PRRS-pos VAC" or "Vac" presented by a 1 in the colum PRRS-VAC, and
"PRRS-pos DK"  or "DK" presented by a 1 in the colum PRRS-DK. And also with
"sanVAC" there should be a 1 in the colum VACsan, and with "sanDK" there
should be a 1 in the colum DKsan. The first date for each "CHR-nr" should
either be the earliest date ne the red box (as in the first picture), or the
date with word "før" before the date (as in the second picture). All the 200
PDF-documents looks like the ones in the pictures, each reprenting a
different "CHR-nr"


Hope you can help me
-- 
View this message in context: http://www.nabble.com/Getting-data-from-a-PDF-file-into-R-tp21667074p21667074.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list