[R] Scanning data files line-by-line

R A F raf1729 at hotmail.com
Wed Apr 30 17:21:23 CEST 2003


Hi all, thanks to everyone again for helping out.  I don't want to
generate too many messages, but this problem seems common enough that
maybe it's worth a summary.

What I can do is this.  Let's say "file" has lines of double, string,
double with variable number of spaces between fields followed by EOF.

aaa <- file( "file", "r" )

while( length( ( x <- scan( aaa, nlines = 1, list( 0, "", 0 ) ) )[[1]] )
 > 0 )
{
   check to see if x is empty again (by length( x[[1]] ) > 0 ) since
   we would read in the EOF character into x still

   if not empty
      start processing
}

close( aaa )

Here x is a list and x[[1]] is the first field, etc.

Professor Ripley also suggested textConnections, but I didn't
experiment -- I'm usually happy to find something that works.  :-)

Thanks again.

>From: Spencer Graves <spencer.graves at pdf.com>
>To: Prof Brian Ripley <ripley at stats.ox.ac.uk>
>CC: R-help at stat.math.ethz.ch, R A F <raf1729 at hotmail.com>
>Subject: Re: [R] Scanning data files line-by-line
>Date: Wed, 30 Apr 2003 07:28:03 -0700
>
>With a "connection" instead of a "file", there is no counterpart to 
>"count.fields" to summarize what's available?
>
>Thanks,
>Spencer Graves
>
>Prof Brian Ripley wrote:
>>On Wed, 30 Apr 2003, R A F wrote:
>>
>>
>>>Thanks very much.  I guess the answer leads to more questions:
>>>
>>>(a) What if I don't know the number of lines?  So I would like to use
>>>    a while loop until readLines hits an EOF character.  Would that
>>>    be possible?
>>
>>
>>Yes. After you reach the end of the file you will get character(0) since
>>
>>Value:
>>
>>      A character vector of length the number of lines read.
>>
>>and zero lines would have been read.
>>
>>
>>>(b) When readLines is used, a string is returned.
>>
>>
>>Not quite: a character vector is returned.
>>
>>
>>>I'd like to split
>>>    the string into fields, and Andy Liaw suggested strsplit, but the
>>>    number of spaces between fields is variable.  So for example, one
>>>    line could be 1 space 2 space space 3 and the next line could be
>>>    4 space space 5 space 6, so I could not do a strsplit using " ".
>>>
>>>    Really what I know is the variable type of each field -- for
>>>    example, each line is double, string, then double, etc.  How
>>>    would one use this information to split the string given by
>>>    readLines?
>>
>>
>>You could use scan on the line: it works on textConnections.
>>
>>
>>>Thanks very much again!



More information about the R-help mailing list