[R] extracting values from txt file that follow user-supplied quote

Rainer Schuermann rainer.schuermann at gmx.net
Wed Jun 6 19:34:18 CEST 2012


R may not be the best tool for this. 
Did you look at gawk? It is also available for Windows:
http://gnuwin32.sourceforge.net/packages/gawk.htm

Once gawk has written a new file that only contains the lines / data you want, you could use R for the next steps.
You also can run gawk from within R with the System() command.

Rgds,
Rainer


On Wednesday 06 June 2012 09:54:15 emorway wrote:
> useRs- 
> 
> I'm attempting to scan a more than 1Gb text file and read and store the
> values that follow a specific key-phrase that is repeated multiple time
> throughout the file.  A snippet of the text file I'm trying to read is
> attached.  The text file is a dumping ground for various aspects of the
> performance of the model that generates it.  Thus, the location of
> information I'm wanting to extract from the file is not in a fixed position
> (i.e. it does not always appears in a predictable location, like line 1000,
> or 2000, etc.).  Rather, the desired values always follow a specific phrase:
> "   PERCENT DISCREPANCY ="
> 
> One approach I took was the following:
> 
> library(R.utils)
> 
> txt_con<-file(description="D:/MCR_BeoPEST - Copy/MCR.out",open="r")
> #The above will need to be altered if one desires to test code on the
> attached txt file, which will run much quicker
> system.time(num_lines<-countLines("D:/MCR_BeoPEST - Copy/MCR.out"))
> #elapsed time on full 1Gb file took about 55 seconds on a 3.6Gh Xeon 
> num_lines
> #14405247
> 
> system.time(
> for(i in 1:num_lines){
>   txt_line<-readLines(txt_con,n=1)
>   if (length(grep("    PERCENT DISCREPANCY =",txt_line))) {
>     pd<-c(pd,as.numeric(substr(txt_line,70,78)))
>   }
> }
> )
> #Time took about 5 minutes
> 
> The inefficiencies in this approach arise due to reading the file twice
> (first to get num_lines, then to step through each line looking for the
> desired text).  
> 
> Is there a way to speed this process up through the use of a ?scan  ?  I
> wan't able to get anything working, but what I had in mind was scan through
> the more than 1Gb file and when the keyphrase (e.g.  "     PERCENT
> DISCREPANCY =  ") is encountered, read and store the next 13 characters
> (which will include some white spaces) as a numeric value, then resume the
> scan until the key phrase is encountered again and repeat until the
> end-of-the-file marker is encountered.  Is such an approach even possible or
> is line-by-line the best bet?
> 
> http://r.789695.n4.nabble.com/file/n4632558/MCR.out MCR.out 
> 
> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/extracting-values-from-txt-file-that-follow-user-supplied-quote-tp4632558.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list