[R] Using readLines on a file without resetting internal file offset parameter?

William Dunlap wdunlap at tibco.com
Wed Oct 29 17:59:26 CET 2014


I meant you should close the file when you are done with it, not after
every few lines.
File descriptors are a limited resource.

As for the rationale for the default behavior, there is a common use
pattern of reading
and parsing an entire file (or url, etc.), examining the results, and trying
again with a different parsing scheme.  In that case the default
behavior works well.

In any case, I assume the behavior is documented in help("file").

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Wed, Oct 29, 2014 at 9:51 AM, Thomas Nyberg <tomnyberg at gmail.com> wrote:
> Thanks for the response! I'd rather keep the file open than close it, since
> it would flush the internal buffer. The whole reason I'm doing this is to
> take advantage of the buffering and closing it would defeat the purpose.
>
> I actually just found a solution which is to open the files with the "r"
> flag explicitly. I.e. the following is what I want.
>
> -----
>
> bash $ echo 1 > testfile
> bash $ echo 2 >> testfile
> bash $ cat testfile
> 1
> 2
>
> bash $ R
> R > f <- file('testfile', 'r')
> R > readLines(f, n = 1)
> [1] "1"
> R > readLines(f, n = 1)
> [1] "2"
> R > readLines(f, n = 1)
> character(0)
>
> -----
>
> If you want to use writeLines in this same fashion you'll also need to open
> the original file with the "w" as well.
>
> It's very odd that file('filename') will let you read from it, but will not
> act the same as file('filename', 'r') when it comes to readLines. Is this a
> bug or is there some reasoning behind this? Regardless, it's certainly
> extremely unintuitive.
>
> Thanks again for the response!
>
> Cheers,
> Thomas
>
>
> On 10/29/2014 12:22 PM, William Dunlap wrote:
>>
>> Open your file object before calling readLines and close it when you
>> are done with
>> a sequence of calls to readLines.
>>
>>    > tf <- tempfile()
>>    > cat(sep="\n", letters[1:10], file=tf)
>>    > f <- file(tf)
>>    > open(f)
>>    > # or f <- file(tf, "r") instead of previous 2 lines
>>    > readLines(f, n=1)
>>    [1] "a"
>>    > readLines(f, n=1)
>>    [1] "b"
>>    > readLines(f, n=2)
>>    [1] "c" "d"
>>    > close(f)
>>
>> I/O operations on an unopened connection generally open it, do the
>> operation,
>> then close it.
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>>
>> On Wed, Oct 29, 2014 at 8:23 AM, Thomas Nyberg <tomnyberg at gmail.com>
>> wrote:
>>>
>>> Hi everyone,
>>>
>>> I would like to read a file line by line, but I would rather not load all
>>> lines into memory first. I've tried using readLines with n = 1, but that
>>> seems to reset the internal file descriptor's file offset after each
>>> call.
>>> I.e. this is the current behavior:
>>>
>>> -------
>>>
>>> bash $ echo 1 > testfile
>>> bash $ echo 2 >> testfile
>>> bash $ cat testfile
>>> 1
>>> 2
>>>
>>> bash > R
>>> R > f <- file('testfile')
>>> R > readLines(f, n = 1)
>>> [1] "1"
>>> R > readLines(f, n = 1)
>>> [1] "1"
>>>
>>> -------
>>>
>>> I would like the behavior to be:
>>>
>>> -------
>>>
>>> bash > R
>>> R > f <- file('testfile')
>>> R > readLines(f, n = 1)
>>> [1] "1"
>>> R > readLines(f, n = 1)
>>> [1] "2"
>>>
>>> -------
>>>
>>> I'm coming to R from a python background, where the default behavior is
>>> exactly the opposite. I.e. when you read a line from a file it is your
>>> responsibility to use seek explicitly to get back to the original
>>> position
>>> in the file (this is rarely necessary though). Is there some flag to turn
>>> off the default behavior of resetting the file offset in R?
>>>
>>> Cheers,
>>> Thomas
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list