[R] Trouble pulling data from a messy ACII file...

Charles C. Berry cberry at tajo.ucsd.edu
Wed Dec 17 21:51:19 CET 2008

On Wed, 17 Dec 2008, Titan8883 wrote:

> Hi all,
> I am a new graduate student who is also new to R. I am ok with the basics,
> but the problem I am having right now seems beyond what I can do..so I am
> looking for advice.

Advice? OK. Here goes.

I would suggest you pull one of the data files into a character vector 
using readLines().

>From there you can try out different methods of finding the data elements 
in the file that you want to extract. If it is guaranteed that 'nominal 
pulse width' ALWAYS shows up on the same line in every file, you can use 
the line numbers to figure out where to look for data elements. If not, 
you will probably want to get familiar with grep() and regular 
expressions, see ?regex and use RSiteSearch("regexpr") and the like to 
turn up the many useful discussions of them on this list.

>From there sub(), gsub(), strsplit(), and friends will help you. They may 
take a good deal of fiddling to get them to digest your data.

If parts of your file can be read using read.csv() or scan() or something, 
you can use a textConnection() to pass some lines that readLines() has 
stored for you to read.csv().

Once you get so that one data file can be processed, rolling up your code 
as a function should not be too hard. Put the function in a loop using

 	res <- list()
 	for(ifile in your.file.list ) res[[ifile]] <- your.function( ifile)

 	res <-	sapply(your.file.list, your.function)

 	res <- lapply(your.file.list, your.function)

and you are ready to chomp away at your files.



I am trying to pull data from flat ASCII files, but they
> do not have a "nice" structure so a simple "read.table" doesn't work. An
> example first half of a data file is below:
> ----------------------------------------------------------------------------------------------
> 19 c:/data/WF-100/2008/20080911/trk/20080911.013115.007.17.txt
> 10 s   name of program that wrote this file trkplt   name of program that
> wrote this file
> 10 GORDON   machine that generated this file   machine that generated this
> file
> 10     3.7 version of program
> 10     3.6 version of this data file
> 10    5.81 version of Universal Library
> 10 20081121.145730 when this file was written
> 10 Windows_XP   operating system used   operating system used
> *
> *       radar characteristics
> 11 WF-100
> 11 20000000  A/D rate, samples/second
> 11 7.5  bin width, m
> 11 800  nominal PRF, Hz
> 11  0.25  nominal pulse width, microsec
> 11 0  tuning, volts
> 11 3.19779  nominal wave length, cm
> -----------------------------------------------------------------------------------------------
> ..the file goes on from there...
> How would I go about getting this data into some kind of useful format? This
> is one of about 1000 files I will need to go through. I would ideally like
> to get these into a format with each data file as a row with columns for the
> various values with the description text removed(version of program, file
> version, tuning volts, etc...).
> I'm not looking for a cut and paste answer, but perhaps some direction on
> where I should start. I have only done basic .csv, table, and line inputs up
> until now.
> Thanks for any advice
> -- 
> View this message in context: http://www.nabble.com/Trouble-pulling-data-from-a-messy-ACII-file...-tp21059239p21059239.html
> Sent from the R help mailing list archive at Nabble.com.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

More information about the R-help mailing list