[R] skip lines on a connection

Prof Brian Ripley ripley at stats.ox.ac.uk
Sun May 2 08:44:13 CEST 2004


On Sat, 1 May 2004, Vadim Ogranovich wrote:

> Andy,
> 
> It is surprising that scan() attempts to read anything at all: note that
> I set nmax=0, which AFAIK means read no lines.

You will be telling us next you think the default nmax=-1 means to read a
negative number of lines!  So reading no lines would mean not calling scan
at all, and what would be the point of that?

nmax <= 0 and nlines <= 0 are ignored.

Note carefully what nmax actually means, and it is not what `nlines' 
means!

> Thank you for a reference to replicate(). I didn't know about it.

Do read the documentation for scan, too, please.


Note that to read *lines* you do need to read every byte on the file to 
find the EOL marker(s) so readLines() or scan() with NULL in "what" are as 
good as anything.  You can use them in blocks of lines, in a loop.

> 
> Thanks,
> Vadim
> 
> -----Original Message-----
> From: Liaw, Andy [mailto:andy_liaw at merck.com] 
> Sent: Saturday, May 01, 2004 5:28 PM
> To: Vadim Ogranovich; r-help at stat.math.ethz.ch
> Subject: RE: [R] skip lines on a connection
> 
> 
> Your scan() call doesn't work because default argument what=0; i.e., it
> expects numeric data.  You probably can just use what="".
> 
> The other alternative is to just loop readLines() n times, reading one
> line at a time.  It probably won't be too bad in terms of time, and
> surely will save on memory usage.
> 
> (Try using replicate().)
> 
> HTH,
> Andy
> 
> > From: Vadim Ogranovich
> > 
> > Unfortunately, seek only works in terms of bytes not lines and I only 
> > know how many lines I need to skip, but not bytes.
> > 
> > 
> > -----Original Message-----
> > From: Gabor Grothendieck [mailto:ggrothendieck at myway.com]
> > Sent: Saturday, May 01, 2004 3:44 PM
> > To: r-help at stat.math.ethz.ch
> > Subject: Re: [R] skip lines on a connection
> > 
> > 
> > 
> > 
> > ?seek
> > 
> > Vadim Ogranovich <vograno <at> evafunds.com> writes:
> > 
> > :
> > : Hi,
> > : 
> > : I am looking for an efficient way of skipping big chunks of 
> > lines on a
> > : connection (not necessarily at the beginning of the file). 
> > One way is
> > to
> > : use read lines, e.g. readLines(1e6), but a) this incurs the overhead
> > of
> > : construction of the return char vector and b) has a (fairly remote)
> > : potential to blow up the memory.
> > : 
> > : Another way would be to use scan(), e.g. 
> > : 
> > : scan(con, skip=1e6, nmax=0)
> > : 
> > : but somehow this doesn't work:
> > : 
> > : > scan(con, skip=10, nmax=0)
> > : Error in scan(con, skip = 10, nmax = 0) : 
> > :  "scan" expected a real, got "A;12;0;"
> > : 
> > : I can stick to readLines, but am curious if there is a better way.
> > : 
> > : I use R-1.8.1 on RH-7.3.
> > : 
> > : Thanks,
> > : Vadim

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list