[Rd] write.dcf/read.dcf cycle converts missing entry to "NA" (PR#9796)

bill at insightful.com bill at insightful.com
Tue Jul 17 18:58:10 CEST 2007


Full_Name: Bill Dunlap
Version: 2.5.0
OS: Red Hat Enterprise Linux WS release 3 (Taroon Update 6)
Submission from: (NULL) (24.17.60.30)


If you read a dcf file with read.dcf(file,fields=c("Field",...))
and the file does not contain the desired field "Field",
read.dcf puts a character NA for that entry in its output
matrix.  If you then call write.dcf, passing it the output
of read.dcf(), it will write the entry "Field: NA".  A subsequent
read.dcf() on write.dcf's output file will then have a "NA",
not a character NA, in the entry for "Field".  I think that
write.dcf() should not write lines in the output file where
the input matrix contains a character NA.

Here is a test function to demonstrate the problem.  It returns
TRUE when a write.dcf/read.dcf cycle does not change the data.

  test.write.dcf <- function () {
     origFile <- tempfile()
     copyFile <- tempfile()
     on.exit(unlink(c(origFile, copyFile)))
     writeLines(c("Package: testA", "Version: 0.1-1", "Depends:", "",
                  "Package: testB", "Version: 2.1"  , "Suggests: testA", "",
                  "Package: testC", "Version: 1.3.1", ""),
                origFile)
     orig <- read.dcf(origFile,
                      fields=c("Package","Version","Depends","Suggests"))
     write.dcf(orig, copyFile, width = 72)
     copy <- read.dcf(copyFile,
                      fields=c("Package","Version","Depends","Suggests"))
     value <- all.equal(orig, copy)
     if (!identical(value, TRUE)) {
        attr(value, "orig") <- orig
        attr(value, "copy") <- copy
     }
     value
  }
Currently we get
  > test.write.dcf()
  [1] "'is.NA' value mismatch: 0 in current 4 in target"
  attr(,"orig")
       Package Version Depends Suggests
  [1,] "testA" "0.1-1" ""      NA
  [2,] "testB" "2.1"   NA      "testA"
  [3,] "testC" "1.3.1" NA      NA
  attr(,"copy")
       Package Version Depends Suggests
  [1,] "testA" "0.1-1" ""      "NA"
  [2,] "testB" "2.1"   "NA"    "testA"
  [3,] "testC" "1.3.1" "NA"    "NA"
With the attached write.dcf() it returns TRUE.

The diff would be
19,22c19,24
<     eor <- character(nr * nc)
<     eor[seq.int(1, nr - 1) * nc] <- "\n"
<     writeLines(paste(formatDL(rep.int(colnames(x), nr), c(t(x)),
<         style = "list", width = width, indent = indent), eor,
---
>     tx <- t(x)
>     not.na <- c(!is.na(tx))
>     eor <- character(sum(not.na))
>     eor[ c(diff(c(col(tx))[not.na]),0)==1 ] <- "\n"
>     writeLines(paste(formatDL(rep.int(colnames(x), nr), c(tx),
>         style = "list", width = width, indent = indent)[not.na], eor,

and the entire function would be

`write.dcf` <-
function (x, file = "", append = FALSE, indent = 0.1 * getOption("width"),
    width = 0.9 * getOption("width"))
{
    if (!is.data.frame(x))
        x <- data.frame(x)
    x <- as.matrix(x)
    mode(x) <- "character"
    if (file == "")
        file <- stdout()
    else if (is.character(file)) {
        file <- file(file, ifelse(append, "a", "w"))
        on.exit(close(file))
    }
    if (!inherits(file, "connection"))
        stop("'file' must be a character string or connection")
    nr <- nrow(x)
    nc <- ncol(x)
    tx <- t(x)
    not.na <- c(!is.na(tx))
    eor <- character(sum(not.na))
    eor[ c(diff(c(col(tx))[not.na]),0)==1 ] <- "\n"
    writeLines(paste(formatDL(rep.int(colnames(x), nr), c(tx),
        style = "list", width = width, indent = indent)[not.na], eor,
        sep = ""), file)
}



More information about the R-devel mailing list