[Rd] Read.dcf with no newline ending: gzfile drops last line

John Muschelli muschellij2 at gmail.com
Mon Nov 14 16:32:16 CET 2016


I don't know if this is a bug per se, but an undesired behavior in
read.dcf.  read.dcf takes a file argument and passes it to gzfile if
it's a character:
    if (is.character(file)) {
        file <- gzfile(file)
        on.exit(close(file))
    }
This gzfile connection is passed to readLines (line #39):
lines <- readLines(file)

If no newline is at the end of the file, readLines doesn't give a
warning (I think appropriate behavior).  If a DESCRIPTION file doesn't
happen to have a newline at the end of it (odd, but it may happen),
then the last tag is dropped:

> x = "Package: test
+ Type: Package"
>
> ######################################
> # No Newline in file
> ######################################
> fname = tempfile()
> writeLines(x, fname, sep = "")
>
> ### readlines with character - warning but all fields
> readLines(fname)
[1] "Package: test" "Type: Package"
Warning message:
In readLines(fname) :
  incomplete final line found on
'/var/folders/1s/wrtqcpxn685_zk570bnx9_rr0000gr/T//Rtmpz95dsT/file180a65a6b745'
> ### readlines with file connection - warning but all fields
> file_con <- file(fname)
> readLines(file_con)
[1] "Package: test" "Type: Package"
Warning message:
In readLines(file_con) :
  incomplete final line found on
'/var/folders/1s/wrtqcpxn685_zk570bnx9_rr0000gr/T//Rtmpz95dsT/file180a65a6b745'
>
> ### readlines with gzfile connection
> ## no warning and drops last field
> gz_con = gzfile(fname)
> readLines(gz_con) # ONLY 1 lines!
[1] "Package: test"
>
> ######################################
> # No Newline in file - fine
> ######################################
> ### readlines with gzfile connection
> ## no warning and drops last field but OK
> writeLines(x, fname, sep = "\n")
> gz_con = gzfile(fname)
> readLines(gz_con)
[1] "Package: test" "Type: Package"

Currently I use file(fname) before read.dcf to be sure a warning
occurs, but all fields are read.  I didn't see anything in read.dcf
help about this.  readLines states clearly:
"If the final line is incomplete (no final EOL marker) the behaviour
depends on whether the connection is blocking or not", but it's not
100% clear that read.dcf uses gzfile if the file is not compressed.


Thanks
John



More information about the R-devel mailing list