[Rd] capture.output(): Using a rawConnection() [linear] instead of textConnection() [exponential]?

Henrik Bengtsson hb at biostat.ucsf.edu
Tue Feb 4 04:22:27 CET 2014


I've noticed that the processing time for the default capture.output()
grows exponentially in the number of characters outputted/captured.
The default settings sinks to a temporary textConnection().  When
instead sinking to a rawConnection(), the processing time becomes
linear.  See below example and attached PNG figure [also at
http://xfer.aroma-project.org/tmp/20140203/capture.output_text-vs-raw.png].

I know little about text encoding, but I wouldn't be surprised I'm
overseeing encoding issues when sinking to raw followed by
rawToChar().  According to the below example, both approaches seem to
work with my current setup (sessionInfo() below), but on the other
hand it's only writing an ASCII "a" character multiple times.

Q. Is it possible to utilize rawConnection() instead of
textConnection() and still get the same output regardless of
encodings?

Q. What causes textConnection() to be exponential when rawConnection()
is linear?


REPRODUCIBLE EXAMPLE:

captureViaText <- function(..., local=TRUE, envir=parent.frame()) {
  eval({
    # Basically what capture.output() does by default
    file <- textConnection("rval", open="w", local=local)
    capture.output(..., file=file)
    close(file)
    rval
  }, envir=envir)
} # captureViaText()

captureViaRaw <- function(..., local=TRUE, envir=parent.frame()) {
  eval({
    # Basically what capture.output() does by default
    file <- rawConnection(raw(0L), open="w")
    capture.output(..., file=file)
    res <- rawConnectionValue(file)
    close(file)
    res <- rawToChar(res)
    res <- unlist(strsplit(res, split="\n", fixed=TRUE))
    res
  }, envir=envir)
} # captureViaRaw()


data <- data.frame(size=10^seq(from=3, to=7.9, by=0.1),
captureViaText=NA, captureViaRaw=NA)
names <- colnames(data)[-1]
for (kk in which(is.na(data[[ncol(data)]]))) {
  # String to output
  x <- rep("a", times=data$size[kk])
  x[seq(from=1L, to=length(x), by=1e3)] <- "\n"
  x <- paste(x, collapse="")
  cat(sprintf("Output size: %d characters\n", nchar(x)))

  # Benchmark
  values <- list()
  for (name in names) {
    print(name)
    fcn <- get(name, mode="function")
    t <- system.time({
      res <- fcn(cat(x))
    }, gcFirst=TRUE)
    n <- sum(nchar(res))
    values[[name]] <- res
    rm(list="res")
    cat(sprintf("%s: %.2g seconds to capture %d characters\n", name, t[3], n))
    data[kk,name] <- t[3]
  } # for (name ...)

  # Assert same captures
  for (cc in seq_along(values)) stopifnot(identical(values[[cc]], values[[1]]))
  rm(list="values")
} # for (size ...)

local({
  png("capture.output_text-vs-raw.png", width=840, height=840)
  on.exit(dev.off())
  plot(data[,c(1,2)], type="n", xlab="Number of characters", ylab="Time (s)")
  for (cc in 2:ncol(data)) lines(data[,c(1,cc)], lwd=2, col=cc)
  legend("topleft", lwd=2, col=2:ncol(data), names, bty="n")
})

> sessionInfo()
R Under development (unstable) (2014-01-30 r64901)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_3.1.0

Thanks,

Henrik
-------------- next part --------------
A non-text attachment was scrubbed...
Name: capture.output_text-vs-raw.png
Type: image/png
Size: 6579 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20140203/ee20d00f/attachment.png>


More information about the R-devel mailing list