[R] readLines without skipNul=TRUE causes crash

Anthony Damico ajdamico at gmail.com
Sun Jul 16 12:40:38 CEST 2017


hi, the text file that prompts the segfault is 4gb but only 80,937 lines

> file.info( "S:/temp/crash.txt")
                        size isdir mode               mtime
ctime               atime exe
S:/temp/crash.txt 4078192743 FALSE  666 2017-07-15 17:24:35 2017-07-15
17:19:47 2017-07-15 17:19:47  no




On Sun, Jul 16, 2017 at 6:34 AM, Duncan Murdoch <murdoch.duncan at gmail.com>
wrote:

> On 16/07/2017 6:17 AM, Anthony Damico wrote:
>
>> thank you for taking the time to write this.  i set it running last
>> night and it's still going -- if it doesn't finish by tomorrow, i will
>> try to find a site to host the problem file and add that link to the bug
>> report so the archive package can be avoided at least.  i'm sorry for
>> the bother
>>
>>
> How big is that text file?  I wouldn't expect my script to take more than
> a few minutes even on a huge file.
>
> My script might have a bug...
>
> Duncan Murdoch
>
> On Sat, Jul 15, 2017 at 4:14 PM, Duncan Murdoch
>> <murdoch.duncan at gmail.com <mailto:murdoch.duncan at gmail.com>> wrote:
>>
>>     On 15/07/2017 11:33 AM, Anthony Damico wrote:
>>
>>         hi, i realized that the segfault happens on the text file in a
>> new R
>>         session.  so, creating the segfault-generating text file requires
>> a
>>         contributed package, but prompting the actual segfault does not --
>>         pretty sure that means this is a base R bug?  submitted here:
>>         https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17311
>>         <https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17311>
>>         hopefully i
>>         am not doing something remarkably stupid.  the text file itself
>>         is 4GB
>>         so cannot upload it to bugzilla, and from the
>>         R_AllocStringBugger error
>>         in the previous message, i think most or all of it needs to be
>>         there to
>>         trigger the segfault.  thanks!
>>
>>
>>     I don't want to download the big file or install the archive
>>     package. Could you run the code below on the bad file?  If you're
>>     right and it's only nulls that matter, this might allow me to create
>>     a file that triggers the bug.
>>
>>     f <-  # put the filename of the bad file here
>>
>>     con <- file(f, open="rb")
>>     zeros <- numeric()
>>     repeat {
>>       bytes <- readBin(con, "int", 1000000, size=1)
>>       zeros <- c(zeros, count + which(bytes == 0))
>>       count <- count + length(bytes)
>>       if (length(bytes) < 1000000) break
>>     }
>>     close(con)
>>     cat("File length=", count, "\n")
>>     cat("Nulls:\n")
>>     zeros
>>
>>     Here's some code to recreate a file of the same length with nulls in
>>     the same places, and spaces everywhere else:
>>
>>     size <- count
>>     f2 <- tempfile()
>>     con <- file(f2, open="wb")
>>     count <- 0
>>     while (count < size) {
>>       nonzeros <- min(c(size - count, 1000000, zeros - 1))
>>       if (nonzeros) {
>>         writeBin(rep(32L, nonzeros), con, size = 1)
>>         count <- count + nonzeros
>>       }
>>       zeros <- zeros - nonzeros
>>       if (length(zeros) && min(zeros) == 1) {
>>         writeBin(0L, con, size = 1)
>>         count <- count + 1
>>         zeros <- zeros[-1] - 1
>>       }
>>     }
>>     close(con)
>>
>>     Duncan Murdoch
>>
>>
>>
>>
>>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list