[R] help reading a variably formatted text file

Corey Moffet cmoffet at nwrc.ars.usda.gov
Tue Nov 19 19:11:41 CET 2002


Dear R-Help,

I have a generated file that looks like the following:

----- Begin file -----
 #
 #       Output File
 #
 float   Version      2002.700000000000
 int     Numdays         31
 int     NumOFEs          1
 #
 #       Hillslope-specific variables
 #
 char    HillVarNames[ 3 ]
         {Days In Simulation}                         
         {Hillslope: Precipitation (mm)}              
         {Hillslope: Average detachment (kg/m**2)}    
 #
 #       OFE-specific variables
 #
 char    OFEVarNames[ 3 ]
         {Irrigation depth (mm)}                      
         {Irrigation_volume_supplied/unit_area (mm)}  
         {Runoff (mm)}                                
 #
 #       Daily values:
 #
     1    5.40000    0.00000    0.00000    0.00000    0.00000
     2    0.00000    0.00000    0.00000    0.00000    0.00000
     3    2.30000    0.00000    0.00000    0.00000    0.00000
     4    0.00000    0.00000    0.00000    0.00000    0.00000
     5    0.00000    0.00000    0.00000    0.00000    0.00000
     6    0.00000    0.00000    0.00000    0.00000    0.00000
     7    0.00000    0.00000    0.00000    0.00000    0.00000
     8    0.00000    0.00000    0.00000    0.00000    0.00000
     9   12.80000    0.00000    0.00000    4.57200    0.00000
    10    0.00000    0.00000    0.00000    0.00000    0.00000
    11    0.00000    0.00000    0.00000    0.00000    0.00000
    12    0.00000    0.00000    0.00000    0.00000    0.00000
    13    0.00000    0.00000    0.00000    0.00000    0.00000
    14    0.00000    0.00000    0.00000    0.00000    0.00000
    15    0.00000    0.00000    0.00000    0.00000    0.00000
    16    0.00000    0.00000    0.00000    0.00000    0.00000
    17    0.00000    0.00000    0.00000    0.00000    0.00000
    18    0.00000    0.00000    0.00000    0.00000    0.00000
    19    0.00000    0.00000    0.00000    0.00000    0.00000
    20    0.00000    0.00000    0.00000    0.00000    0.00000
    21    0.00000    0.00000    0.00000    0.00000    0.00000
    22    0.00000    0.00000    0.00000    0.00000    0.00000
    23    0.00000    0.00000    0.00000    0.00000    0.00000
    24    0.00000    0.00000    0.00000    0.00000    0.00000
    25    0.00000    0.00000    0.00000    0.00000    0.00000
    26    0.00000    0.00000    0.00000    0.00000    0.00000
    27    0.00000    0.00000    0.00000    0.00000    0.00000
    28    0.00000    0.00000    0.00000    0.00000    0.00000
    29   32.30000    0.00001    0.00001    4.57200    0.00000
    30    0.00000    0.00000    0.00000    0.00000    0.00000
    31    0.00000    0.00000    0.00000    0.00000    0.00000
 #
 #       Minimum/Maximum values:
 #
     1    0.00000    0.00000    0.00000    0.00000    0.00000
    63   32.30000    0.00001    0.00001    4.57200    0.00000

----- end file -----

Note: Spaces in the first column are real.

I would like to read in a data.frame containing only the data between:

" #
 #        Daily values:
 #"
and 
" #
 #       Minimum/Maximum values:
 #"

but the number of columns in the dataset will vary.  The information 
describing how it veries is contained in the sections:

" char    HillVarNames[ 3 ]
         {Days In Simulation}                         
         {Hillslope: Precipitation (mm)}              
         {Hillslope: Average detachment (kg/m**2)}"
and 

" char    OFEVarNames[ 3 ]
         {Irrigation depth (mm)}                      
         {Irrigation_volume_supplied/unit_area (mm)}  
         {Runoff (mm)}"

the number of columns is the sum of HillVarNames and OFEVarNames (6), and
the column labels are listed below.

Depending on options in the model run which generates this file, the number
of columns can change.  But I would like to write a function that reads the
file
and makes a data.frame with two columns, day and runoff, in this case columns
1 and 6 in the file.  If I can parse the variable names into a vector
I can determine which element has {Days In Simulation} and {Runoff (mm)} but
I am having trouble finding a function that will allow me to read in parts
of the
file and use information gathered along the way to direct additional reading.

The procedure I invision will look like this:

(1) skip first 9 lines
(2) read 3rd word in next line and assign to variable hillvarnames
(3) read hillvarnames more lines
(4) test which line has the value {Days In Simulation} and assign index to
daycolumn.
(5) skip 3 lines
(6) read 3rd word in next line and assign to variable ofevarnames
(7) read ofevarnames more lines
(8) test which line has the value {Runoff (mm)} and assign
index+hillvarnames to runoffcolumn.
(9) skip 3 lines
(10) read lines until 5 lines remain and assign the values in the daycolumn
and runoffcolumn columns to a data.frame with columns day and runoff.

Is this a reasonable thing to do in R?  Are there some functions that 
will make this task less difficult?  Is there a function that alows you to 
read a small amount of information, parse it, test it, and then begin reading 
again where it left off?

I am using the following R version:
         _              
platform i386-pc-mingw32
arch     i386           
os       mingw32        
system   i386, mingw32  
status                  
major    1              
minor    6.1            
year     2002           
month    11             
day      01             
language R              

Thank you in advance.

With best wishes and kind regards I am

Sincerely,

Corey A. Moffet
Support Scientist

University of Idaho
Northwest Watershed Research Center
800 Park Blvd, Plaza IV, Suite 105
Boise, ID 83712-7716
(208) 422-0718
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list