[R] extracting values from txt file that follow user-supplied quote

emorway emorway at usgs.gov
Fri Jun 8 16:30:06 CEST 2012


I'll summarize the results in terms of total run time for the suggestions
that have been made as well as post the code for those that come across this
post in the future.  First the results (the code for which is provided
second):

What I tried to do using suggestions from Bert and Dan:
t1
#   user  system elapsed 
# 208.21    1.68  210.34

Gabor's suggested code:
t2
#   user  system elapsed 
#  51.12    0.63   51.75 

Rui's suggested code:
t3a  #(Get the number of lines)
#   user  system elapsed 
#  45.13   11.08   56.23
t3b  #(now perform the function
#   user  system elapsed 
#  50.59    0.55   51.16

So in summary it appears that Gabor's and Rui's code are quite similar (in
terms of runtime) if a priori knowledge of the number of lines in the file
is known (e.g. t2 is roughly equal to t3b).  It would seem Gabor's code is a
little more robust since it doesn't require the total number of lines in the
file be supplied.  And here is the code used to get these times (note that
the file I used was the 1GB text file, not the reduced version attached to
the top post of this thread):

#----------------
#modified attempt
#----------------

library(gsubfn)
library(tcltk2)
p<-"[-0-9]\\S+"
pd<-numeric()
txt_con<-file(description="D:/MCR_BeoPEST - Copy/MCR.out",open="r")
t1<-system.time(
while (length(txt_line<-readLines(txt_con,n=1))){
  if (length(grep("DISCREPANCY = ",txt_line))) {
    pd<-c(pd,as.numeric(strapplyc(txt_line, p)[[1]]))
  }
})
close(txt_con)
t1
#   user  system elapsed 
# 208.21    1.68  210.34


#----------------------------
#Suggested by G. Grothendieck
#----------------------------

g<-function(txt_con, string, from, to, ...) {
        L <- readLines(txt_con)
        matched <- grep(string, L, value = TRUE, ...)
        as.numeric(substring(matched, from, to))
}
txt_con<-file(description="D:/MCR_BeoPEST - Copy/MCR.out",open="r")
t2<-system.time(
  edm<-g(txt_con, "PERCENT DISCREPANCY = ", 70, 78, fixed = TRUE) 
)
close(txt_con)
t2
#   user  system elapsed 
#  51.12    0.63   51.75 

#-------------------------
#Suggested by Rui Barradas
#-------------------------

library(R.utils)
t3a<-system.time(num_lines<-countLines("D:/MCR_BeoPEST - Copy/MCR.out"))
t3a
#   user  system elapsed 
#  45.13   11.08   56.23


fun <- function(con, pattern, nlines, n=5000L){
        if(is.character(con)){
                con <- file(con, open="rt")
                on.exit(close(con))
        }
        passes <- nlines %/% n
        remaining <- nlines %% n
        res <- NULL
        for(i in seq_len(passes)){
                txt <- readLines(con, n=n)
                res <- c(res, as.numeric(substr(txt[grepl(pattern, txt)],
70, 78)))
        }
        if(remaining){
                txt <- readLines(con, n=remaining)
                res <- c(res, as.numeric(substr(txt[grepl(pattern, txt)],
70, 78)))
        }
        res
}

txt_con<-file(description="D:/MCR_BeoPEST - Copy/MCR.out",open="r")
pat<-"PERCENT DISCREPANCY ="
num_lines <- 14405247L
t3b <- system.time(pd2 <- fun(txt_con, pat, num_lines, 100000L)) 
close(txt_con)
t3b
#   user  system elapsed 
#  50.59    0.55   51.16



--
View this message in context: http://r.789695.n4.nabble.com/extracting-values-from-txt-file-that-follow-user-supplied-quote-tp4632558p4632810.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list