[R] Reading recurring data in a text file

Morway, Eric emorw@y @end|ng |rom u@g@@gov
Wed Jul 24 20:54:07 CEST 2019


The small reproducible example below works, but is way too slow on the real
problem.  The real problem is attempting to extract ~2920 repeated arrays
from a 60 Mb file and takes ~80 minutes.  I'm wondering how I might
re-engineer the script to avoid opening and closing the file 2920 times as
is the case now.  That is, is there a way to keep the file open and peel
out the arrays and stuff them into a list of data.tables, as is done in the
small reproducible example below, but in a significantly faster way?

wha <- "     INITIAL PRESSURE HEAD
     INITIAL TEMPERATURE SET TO 4.000E+00 DEGREES C
     VS2DH - MedSand for TL test

     TOTAL ELAPSED TIME =  0.000000E+00 sec
     TIME STEP         0

     MOISTURE CONTENT
  Z, IN
  m                       X OR R DISTANCE, IN m
                0.500
     0.075     0.1475
     0.225     0.1475
     0.375     0.1475
     0.525     0.1475
     0.675     0.1475
blah
blah
blah
     TEMPERATURE, IN DECREES C
  Z, IN
  m                       X OR R DISTANCE, IN m
                0.500
     0.075     1.1475
     0.225     2.1475
     0.375     3.1475
     0.525     4.1475
     0.675     5.1475
blah
blah
blah

     TOTAL ELAPSED TIME =  8.6400E+04 sec
     TIME STEP         0

     MOISTURE CONTENT
  Z, IN
  m                       X OR R DISTANCE, IN m
                0.500
     0.075     0.1875
     0.225     0.1775
     0.375     0.1575
     0.525     0.1675
     0.675     0.1475
blah
blah
blah     TEMPERATURE, IN DECREES C
  Z, IN
  m                       X OR R DISTANCE, IN m
                0.500
     0.075     1.1475
     0.225     2.1475
     0.375     3.1475
     0.525     4.1475
     0.675     5.1475
blah
blah
blah"

example_content <- textConnection(wha)

srchStr1 <- '     MOISTURE CONTENT'
srchStr2 <- 'TEMPERATURE, IN DECREES C'

lines   <- readLines(example_content)
mc_list <- NULL
for (i in 1:length(lines)){
  # Look for start of water content
  if(grepl(srchStr1, lines[i])){
    mc_list <- c(mc_list, i)
  }
}

tmp_list <- NULL
for (i in 1:length(lines)){
  # Look for start of temperature data
  if(grepl(srchStr2, lines[i])){
    tmp_list <- c(tmp_list, i)
  }
}

# Store the water content arrays
wc <- list()
# Read all the moisture content profiles
for(i in 1:length(mc_list)){
  lineNum <- mc_list[i] + 3
  mct <- read.table(text = wha, skip=lineNum, nrows=5,
                    col.names=c('depth','wc'))
  wc[[i]] <- mct
}

# Store the water temperature arrays
tmp <- list()
# Read all the temperature profiles
for(i in 1:length(tmp_list)){
  lineNum <- tmp_list[i] + 3
  tmpt <- read.table(text = wha, skip=lineNum, nrows=5,
                    col.names=c('depth','tmp'))
  tmp[[i]] <- tmpt
}

# quick inspection
length(wc)
wc[[1]]
# Looks like what I'm after, but too slow in real world problem

	[[alternative HTML version deleted]]



More information about the R-help mailing list