[R] parsing a data file

Tue Apr 27 12:10:55 CEST 2004

On 27-Apr-04 Tamas Papp wrote:
> I need to parse a data file (output of a measuring device) of the
> following format:
> 
> BEGIN RECORD [first record data] RECORD [second
> record data] RECORD
> [third record data]
> END
> 
> Line breaks can (and do ;-() occur anywhere.  White space behaves very
> much like TeX, eg it is not important whether there are one or more
> spaces or linebreaks as long as there is one of them.  It is a text
> file, not binary.
> 
> I need to extract the record data I marked with []'s, eg a vector such
> as c("[first record data]", "[second]", ...) would be nice as a
> result.
> 
> What functions should I use for this?

I don't know whether there is any R function capable of handling
a format as anarchic as this one, but if you are willing to do the
job outside R (i.e. produce a derived data file which is cleanly
structured which can then be read by R) then it looks like an awk
job (some might say perl job). You can use sed to strip cruft.

For example:

cat temp
BEGIN RECORD [first record data] RECORD [second
record data] RECORD
[third record data]
END

cat temp | sed 's/BEGIN//' | sed 's/END//' | tr '\n' ' ' |
    awk 'BEGIN{RS="RECORD"}{print $0}'

 [first record data] 
 [second record data] 
 [third record data]

Does this help?
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 167 1972
Date: 27-Apr-04                                       Time: 11:10:55
------------------------------ XFMail ------------------------------