[R] Narrowing values collected from .txt file

jim holtman jholtman at gmail.com
Thu Aug 29 14:40:15 CEST 2013


Here is how I would do it since are reading in the entire file.  This
breaks on each "Flow Budget" section, extracts the RECHARGE values and
puts them in a list with the name of the Flow Budget:


> # read entire file
> input <- readLines("C:\\Users\\jh52822\\Downloads\\MCR_Budgets.txt")
> # determine the lines of interest
> indx <- grep("Flow Budget for Zone|RECHARGE =", input)
> # remove everything else
> input <- input[indx]
> # split by Flow Budget
> sep <- split(input, cumsum(grepl("Flow Budget", input)))
> # process the list extracting data
> result <- lapply(sep, function(.lines){
+     as.numeric(sub(".*=(.*)", "\\1", .lines[-1]))
+ })
>
> # extract the names for each Flow
> fNames <- sapply(sep, '[', 1)
>
> # add to the list
> names(result) <- fNames
>  result
$`     Flow Budget for Zone  1 at Time Step   1 of Stress Period   2`
[1] 128980      0      0      0

$`     Flow Budget for Zone  2 at Time Step   1 of Stress Period   2`
[1] 274160      0      0      0

$`     Flow Budget for Zone  3 at Time Step   1 of Stress Period   2`
[1] 81084     0     0     0

$`     Flow Budget for Zone  1 at Time Step   1 of Stress Period   3`
[1] 128980      0      0      0

$`     Flow Budget for Zone  2 at Time Step   1 of Stress Period   3`
[1] 274160      0      0      0

$`     Flow Budget for Zone  3 at Time Step   1 of Stress Period   3`
[1] 81084     0     0     0

$`     Flow Budget for Zone  1 at Time Step   1 of Stress Period   4`
[1] 128980      0      0      0

$`     Flow Budget for Zone  2 at Time Step   1 of Stress Period   4`
[1] 274160      0      0      0

$`     Flow Budget for Zone  3 at Time Step   1 of Stress Period   4`
[1] 81084     0     0     0

$`     Flow Budget for Zone  1 at Time Step   1 of Stress Period   5`
[1] 128980      0      0      0

$`     Flow Budget for Zone  2 at Time Step   1 of Stress Period   5`
[1] 274160      0      0      0

$`     Flow Budget for Zone  3 at Time Step   1 of Stress Period   5`
[1] 81084     0     0     0

$`     Flow Budget for Zone  1 at Time Step   1 of Stress Period   6`
[1] 128980      0      0      0

$`     Flow Budget for Zone  2 at Time Step   1 of Stress Period   6`
[1] 274160      0      0      0

$`     Flow Budget for Zone  3 at Time Step   1 of Stress Period   6`
[1] 81084     0     0     0

$`     Flow Budget for Zone  1 at Time Step   1 of Stress Period   7`
[1] 128980      0      0      0
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Wed, Aug 28, 2013 at 7:45 PM, Morway, Eric <emorway at usgs.gov> wrote:
> It looks as though the attachment to my last post didn't make the cut (or
> at least it's not appearing on the Nabble forum), for one reason or
> another.  I'm reattaching a smaller version so folks can run the code
> (won't work without a text file to operate on).  So, while the attached
> file is only a small sample of the larger file and will therefore run
> quickly, I would still be helpful if someone knows a more efficient
> approach to the code in the previous post.
>
>
> On Wed, Aug 28, 2013 at 11:28 AM,
>
>>
>> A relatively concise, commented, working solution to the problem
>> originally motivating this thread was found (below).  I suspect the
>> approach I've taken has a major inefficiency through the use of the
>> "scan" statement appearing inside the function "g".  The way the code
>> works right now, it has to re-open and read the file 'length(matched)
>> times' rather than sequentially reading through to the next pertinent
>> section of the txt file.  Does anyone have a more efficient approach in
>> mind so I don't have to wait 1/2 hour to get the results? (The only
>> adjustment to the code that follows is to point "txt" to wherever the
>> attached file is placed)
>>
>>
>> # where is the file?
>> txt<-"c:/temp/MCR_Budgets.txt"
>>
>> # Demarcation header
>> hdr_str<-"Flow Budget for Zone  2"
>>
>> # string to identify lines with desired values
>> srch_str<-"  RECHARGE ="
>>
>> # retrieves desired values
>> g<-function(txt_con, hdr_str, srch_str, from, to, ...) {
>>
>>     L <- readLines(txt_con)
>>
>>     #matched contains the line #s w/ hdr_str
>>     matched <- grep(hdr_str, L, value = FALSE, ...)
>>
>>     #initialize output list
>>     fetched_list<-numeric()
>>
>>     #for each instance of hdr_str, loop
>>     for(i in 1:(length(matched))){
>>
>>       #retrieve a section of text following each hdr_str
>>       snippet<-scan(txt_con, what=character(), skip=matched[i]-1, n=42,
>> sep='\n')
>>
>>       #get data within the short section of retrieved text
>>       fetched <- grep(srch_str, snippet, value=TRUE)
>>
>>       #append output vector for plotting time series
>>       fetched_list <- c(fetched_list, as.numeric(substring(fetched, from,
>> to)))
>>
>>       #monitor
>>       print(i)
>>     }
>>
>>     #return desired values
>>     as.numeric(fetched_list)
>> }
>>
>> #The results of system.time reflect full 147 MB file,
>> # only half of which is attached.
>> system.time(
>>   rech_z2<-g(txt,hdr_str,srch_str,37,51)
>> )
>> #   user  system elapsed
>> #1740.48   36.08 1825.77
>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list