[R] Reading recurring data in a text file

Jeff Newmiller jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@
Wed Jul 24 22:52:51 CEST 2019


?readLines
?grep
?textConnection

On July 24, 2019 11:54:07 AM PDT, "Morway, Eric via R-help" <r-help using r-project.org> wrote:
>The small reproducible example below works, but is way too slow on the
>real
>problem.  The real problem is attempting to extract ~2920 repeated
>arrays
>from a 60 Mb file and takes ~80 minutes.  I'm wondering how I might
>re-engineer the script to avoid opening and closing the file 2920 times
>as
>is the case now.  That is, is there a way to keep the file open and
>peel
>out the arrays and stuff them into a list of data.tables, as is done in
>the
>small reproducible example below, but in a significantly faster way?
>
>wha <- "     INITIAL PRESSURE HEAD
>     INITIAL TEMPERATURE SET TO 4.000E+00 DEGREES C
>     VS2DH - MedSand for TL test
>
>     TOTAL ELAPSED TIME =  0.000000E+00 sec
>     TIME STEP         0
>
>     MOISTURE CONTENT
>  Z, IN
>  m                       X OR R DISTANCE, IN m
>                0.500
>     0.075     0.1475
>     0.225     0.1475
>     0.375     0.1475
>     0.525     0.1475
>     0.675     0.1475
>blah
>blah
>blah
>     TEMPERATURE, IN DECREES C
>  Z, IN
>  m                       X OR R DISTANCE, IN m
>                0.500
>     0.075     1.1475
>     0.225     2.1475
>     0.375     3.1475
>     0.525     4.1475
>     0.675     5.1475
>blah
>blah
>blah
>
>     TOTAL ELAPSED TIME =  8.6400E+04 sec
>     TIME STEP         0
>
>     MOISTURE CONTENT
>  Z, IN
>  m                       X OR R DISTANCE, IN m
>                0.500
>     0.075     0.1875
>     0.225     0.1775
>     0.375     0.1575
>     0.525     0.1675
>     0.675     0.1475
>blah
>blah
>blah     TEMPERATURE, IN DECREES C
>  Z, IN
>  m                       X OR R DISTANCE, IN m
>                0.500
>     0.075     1.1475
>     0.225     2.1475
>     0.375     3.1475
>     0.525     4.1475
>     0.675     5.1475
>blah
>blah
>blah"
>
>example_content <- textConnection(wha)
>
>srchStr1 <- '     MOISTURE CONTENT'
>srchStr2 <- 'TEMPERATURE, IN DECREES C'
>
>lines   <- readLines(example_content)
>mc_list <- NULL
>for (i in 1:length(lines)){
>  # Look for start of water content
>  if(grepl(srchStr1, lines[i])){
>    mc_list <- c(mc_list, i)
>  }
>}
>
>tmp_list <- NULL
>for (i in 1:length(lines)){
>  # Look for start of temperature data
>  if(grepl(srchStr2, lines[i])){
>    tmp_list <- c(tmp_list, i)
>  }
>}
>
># Store the water content arrays
>wc <- list()
># Read all the moisture content profiles
>for(i in 1:length(mc_list)){
>  lineNum <- mc_list[i] + 3
>  mct <- read.table(text = wha, skip=lineNum, nrows=5,
>                    col.names=c('depth','wc'))
>  wc[[i]] <- mct
>}
>
># Store the water temperature arrays
>tmp <- list()
># Read all the temperature profiles
>for(i in 1:length(tmp_list)){
>  lineNum <- tmp_list[i] + 3
>  tmpt <- read.table(text = wha, skip=lineNum, nrows=5,
>                    col.names=c('depth','tmp'))
>  tmp[[i]] <- tmpt
>}
>
># quick inspection
>length(wc)
>wc[[1]]
># Looks like what I'm after, but too slow in real world problem
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.



More information about the R-help mailing list