[R] Trouble pulling data from a messy ASCII file...

jim holtman jholtman at gmail.com
Fri Dec 19 03:49:18 CET 2008


Here is an example of some code that might do it for you::

> input <- readLines(textConnection("19 c:/data/WF-100/2008/20080911/trk/20080911.013115.007.17.txt
+ 10 s   name of program that wrote this file trkplt   name of program
that wrote this file
+ 10 GORDON   machine that generated this file   machine that
generated this file
+ 10     3.7 version of program
+ 10     3.6 version of this data file
+ 10    5.81 version of Universal Library
+ 10 20081121.145730 when this file was written
+ 10 Windows_XP   operating system used   operating system used
+ *
+ *       radar characteristics
+ 11 WF-100
+ 11 20000000  A/D rate, samples/second
+ 11 7.5  bin width, m
+ 11 800  nominal PRF, Hz
+ 11  0.25  nominal pulse width, microsec
+ 11 0  tuning, volts
+ 11 3.19779  nominal wave length, cm"))
> closeAllConnections()
>
> # parse out the data
> f.parse <- function(line){
+     x <- sub("^(\\S+)\\s+(\\S+)\\s*(.*)", "\\1`\\2`\\3", line)
+     unlist(strsplit(x, "`"))
+ }
>
> fileName <- ''
> result <- NULL
> for (i in input){
+     values <- f.parse(i)
+     switch(values[1],
+         '19'={fileName <<- values[2]},
+         '*'=NULL,   # ignore comments
+         '10'=,
+         '11'={result <<- rbind(result, c(fileName, values[3], values[2]))}
+     )
+ }
> # convert to dataframe for 'melt'
> result <- as.data.frame(result, stringsAsFactors=FALSE)
> names(result) <- c('fileName', 'variable', 'value')
> require(reshape)
> cast(result, fileName ~ variable, c)
                                                     fileName A/D
rate, samples/second bin width, m
1 c:/data/WF-100/2008/20080911/trk/20080911.013115.007.17.txt
       20000000          7.5
  machine that generated this file   machine that generated this file
1                                                              GORDON
  name of program that wrote this file trkplt   name of program that
wrote this file nominal PRF, Hz
1
            s             800
  nominal pulse width, microsec nominal wave length, cm operating
system used   operating system used
1                          0.25                 3.19779
                    Windows_XP
  tuning, volts version of program version of this data file version
of Universal Library
1             0                3.7                       3.6
              5.81
  when this file was written     NA
1            20081121.145730 WF-100
>
>


On Wed, Dec 17, 2008 at 12:21 PM, Titan8883 <jplaney at gmail.com> wrote:
>
> The output I would be looking for would be one row for each data file with
> columns for each variable, so using a .csv example with a few variables
> would be:
> -------------------------------------------------------------------------
> File_name,date_written,program_ver,data_file_ver,bin_width
> 20080911.013115.007.17.txt, 20081121.145730,3.7,3.6,7.5
> --------------------------------------------------------------------------
> My plan is to create a table with all the data files listed. This would
> allow me to find mean/min/max values for different variables,sort by a
> certain variable, etc. I am not limiting myself to R, I have seen awk
> mentioned before, so that sounds like it is worth looking at to prep the
> data.
>
> Hope that helps.
>
>
>
>
>
> jholtman wrote:
>>
>> It would be helpful if you could show what the output would be for the
>> example given.  Exactly what are 'values' and what would be the
>> 'headings'.  As mentioned before, you can use readLines and then parse
>> the data you want, but something like Perl might be easier, but it is
>> hard to tell from the mail.
>>
>> On Wed, Dec 17, 2008 at 2:37 PM, Titan8883 <jplaney at gmail.com> wrote:
>>>
>>> Hi all,
>>>
>>> I am a new graduate student who is also new to R. I am ok with the
>>> basics,
>>> but the problem I am having right now seems beyond what I can do..so I am
>>> looking for advice. I am trying to pull data from flat ASCII files, but
>>> they
>>> do not have a "nice" structure so a simple "read.table" doesn't work. An
>>> example first half of a data file is below:
>>> ----------------------------------------------------------------------------------------------
>>> 19 c:/data/WF-100/2008/20080911/trk/20080911.013115.007.17.txt
>>> 10 s   name of program that wrote this file trkplt   name of program that
>>> wrote this file
>>> 10 GORDON   machine that generated this file   machine that generated
>>> this
>>> file
>>> 10     3.7 version of program
>>> 10     3.6 version of this data file
>>> 10    5.81 version of Universal Library
>>> 10 20081121.145730 when this file was written
>>> 10 Windows_XP   operating system used   operating system used
>>> *
>>> *       radar characteristics
>>> 11 WF-100
>>> 11 20000000  A/D rate, samples/second
>>> 11 7.5  bin width, m
>>> 11 800  nominal PRF, Hz
>>> 11  0.25  nominal pulse width, microsec
>>> 11 0  tuning, volts
>>> 11 3.19779  nominal wave length, cm
>>> -----------------------------------------------------------------------------------------------
>>> ..the file goes on from there...
>>>
>>> How would I go about getting this data into some kind of useful format?
>>> This
>>> is one of about 1000 files I will need to go through. I would ideally
>>> like
>>> to get these into a format with each data file as a row with columns for
>>> the
>>> various values with the description text removed(version of program, file
>>> version, tuning volts, etc...).
>>>
>>> I'm not looking for a cut and paste answer, but perhaps some direction on
>>> where I should start. I have only done basic .csv, table, and line inputs
>>> up
>>> until now.
>>>
>>> Thanks for any advice
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Trouble-pulling-data-from-a-messy-ASCII-file...-tp21059239p21059239.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Trouble-pulling-data-from-a-messy-ASCII-file...-tp21059239p21060639.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?



More information about the R-help mailing list