[R] Using readLines on a file without resetting internal file offset parameter?

Thomas Nyberg tomnyberg at gmail.com
Wed Oct 29 18:12:54 CET 2014


Yeah of course you should close the file when done. I didn't give a 
complete code snippet.

In any case, a quick glance at the documentation seems to imply that 
opening a file as file('filename') will defer the choice of mode (i.e. 
is it 'r', 'w', etc.?) until it is first used. In my case the first use 
is a read, so it should presumably be set to "r". However as shown in my 
examples this does work differently than opening it as 
file('filename','r') in the first place.

I do agree that it is reasonable that the default behavior of 
readLines/writeLines might be to reset the file offset each time, but I 
certainly do not agree that that should be happening dependent upon 
whether the original file is opened without 'r' and then read from 
versus being opened with 'r' in the first place. That kind of a 
side-effect really makes no sense to me and is entirely unintuitive. If 
the goal was to have the default behavior to reset the file offset, a 
reasonable thing would be to have a flag in readLines like 
reset_fileoffset = TRUE or something like that.

In any case, thanks so much for the help!

Cheers,
Thomas

On 10/29/2014 12:59 PM, William Dunlap wrote:
> I meant you should close the file when you are done with it, not after
> every few lines.
> File descriptors are a limited resource.
>
> As for the rationale for the default behavior, there is a common use
> pattern of reading
> and parsing an entire file (or url, etc.), examining the results, and trying
> again with a different parsing scheme.  In that case the default
> behavior works well.
>
> In any case, I assume the behavior is documented in help("file").
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Wed, Oct 29, 2014 at 9:51 AM, Thomas Nyberg <tomnyberg at gmail.com> wrote:
>> Thanks for the response! I'd rather keep the file open than close it, since
>> it would flush the internal buffer. The whole reason I'm doing this is to
>> take advantage of the buffering and closing it would defeat the purpose.
>>
>> I actually just found a solution which is to open the files with the "r"
>> flag explicitly. I.e. the following is what I want.
>>
>> -----
>>
>> bash $ echo 1 > testfile
>> bash $ echo 2 >> testfile
>> bash $ cat testfile
>> 1
>> 2
>>
>> bash $ R
>> R > f <- file('testfile', 'r')
>> R > readLines(f, n = 1)
>> [1] "1"
>> R > readLines(f, n = 1)
>> [1] "2"
>> R > readLines(f, n = 1)
>> character(0)
>>
>> -----
>>
>> If you want to use writeLines in this same fashion you'll also need to open
>> the original file with the "w" as well.
>>
>> It's very odd that file('filename') will let you read from it, but will not
>> act the same as file('filename', 'r') when it comes to readLines. Is this a
>> bug or is there some reasoning behind this? Regardless, it's certainly
>> extremely unintuitive.
>>
>> Thanks again for the response!
>>
>> Cheers,
>> Thomas
>>
>>
>> On 10/29/2014 12:22 PM, William Dunlap wrote:
>>>
>>> Open your file object before calling readLines and close it when you
>>> are done with
>>> a sequence of calls to readLines.
>>>
>>>     > tf <- tempfile()
>>>     > cat(sep="\n", letters[1:10], file=tf)
>>>     > f <- file(tf)
>>>     > open(f)
>>>     > # or f <- file(tf, "r") instead of previous 2 lines
>>>     > readLines(f, n=1)
>>>     [1] "a"
>>>     > readLines(f, n=1)
>>>     [1] "b"
>>>     > readLines(f, n=2)
>>>     [1] "c" "d"
>>>     > close(f)
>>>
>>> I/O operations on an unopened connection generally open it, do the
>>> operation,
>>> then close it.
>>>
>>> Bill Dunlap
>>> TIBCO Software
>>> wdunlap tibco.com
>>>
>>>
>>> On Wed, Oct 29, 2014 at 8:23 AM, Thomas Nyberg <tomnyberg at gmail.com>
>>> wrote:
>>>>
>>>> Hi everyone,
>>>>
>>>> I would like to read a file line by line, but I would rather not load all
>>>> lines into memory first. I've tried using readLines with n = 1, but that
>>>> seems to reset the internal file descriptor's file offset after each
>>>> call.
>>>> I.e. this is the current behavior:
>>>>
>>>> -------
>>>>
>>>> bash $ echo 1 > testfile
>>>> bash $ echo 2 >> testfile
>>>> bash $ cat testfile
>>>> 1
>>>> 2
>>>>
>>>> bash > R
>>>> R > f <- file('testfile')
>>>> R > readLines(f, n = 1)
>>>> [1] "1"
>>>> R > readLines(f, n = 1)
>>>> [1] "1"
>>>>
>>>> -------
>>>>
>>>> I would like the behavior to be:
>>>>
>>>> -------
>>>>
>>>> bash > R
>>>> R > f <- file('testfile')
>>>> R > readLines(f, n = 1)
>>>> [1] "1"
>>>> R > readLines(f, n = 1)
>>>> [1] "2"
>>>>
>>>> -------
>>>>
>>>> I'm coming to R from a python background, where the default behavior is
>>>> exactly the opposite. I.e. when you read a line from a file it is your
>>>> responsibility to use seek explicitly to get back to the original
>>>> position
>>>> in the file (this is rarely necessary though). Is there some flag to turn
>>>> off the default behavior of resetting the file offset in R?
>>>>
>>>> Cheers,
>>>> Thomas
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list