[Rd] sweep sanity checking?

Petr Savicky savicky at cs.cas.cz
Fri Jul 27 09:10:06 CEST 2007


When I was preparing the patch of sweep submitted on July 25, I was
unaware of the code by Heather Turner. She suggested a very elegant
solution, if STATS is a vector and we want to use meaningful recycling
in full generality. I would like to suggest a combined solution, which
uses Heather Turner's algorithm if check.margin=FALSE (default) and STATS
is a vector and my previous algorithm, if check.margin=TRUE or STATS is
an array. The suggestion is

  # combined from the original code of sweep without warnings and from
  # https://stat.ethz.ch/pipermail/r-help/2005-June/073989.html by Robin Hankin
  # https://stat.ethz.ch/pipermail/r-help/2005-June/074001.html by Heather Turner
  # https://stat.ethz.ch/pipermail/r-devel/2007-June/046217.html by Ben Bolker
  # with some further modifications by Petr Savicky
  sweep <- function(x, MARGIN, STATS, FUN = "-", check.margin=FALSE, ...)
  {
      FUN <- match.fun(FUN)
      dims <- dim(x)
      dimmargin <- dims[MARGIN]
      if (is.null(dim(STATS))) {
          dimstats <- length(STATS)
      } else {
          dimstats <- dim(STATS)
          check.margin <- TRUE
      }
      s <- length(STATS)
      if (s > prod(dimmargin)) {
          warning("length of STATS greater than the extent of dim(x)[MARGIN]")
      } else if (check.margin) {
          dimmargin <- dimmargin[dimmargin > 1]
          dimstats <- dimstats[dimstats > 1]
          if (length(dimstats) > length(dimmargin) ||
              any(dimstats != dimmargin[seq(along.with=dimstats)]))
              warning("length(STATS) or dim(STATS) do not match dim(x)[MARGIN]")
      } else {
          cumDim <- c(1, cumprod(dimmargin))
          upper <- min(cumDim[cumDim >= s])
          lower <- max(cumDim[cumDim <= s])
          if (upper %% s != 0 || s %% lower != 0)
              warning("STATS does not recycle exactly across MARGIN")
      }
      perm <- c(MARGIN, (1:length(dims))[ - MARGIN])
      FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...)
  }

Heather presented four examples testing her code:
  sweep(array(1:24, dim = c(4,3,2)), 1, 1:2)    # no warning
  sweep(array(1:24, dim = c(4,3,2)), 1, 1:12)   # no warning
  sweep(array(1:24, dim = c(4,3,2)), 1, 1:24)   # no warning
  sweep(array(1:24, dim = c(4,3,2)), 1:2, 1:3)  # warning
The second and third example are not really correct, since STATS extends
also to dimensions not included in MARGIN. The problem is better visible
for example in
  sweep(array(1:24, dim = c(4,4,3,3,2,2)), c(1,3), 1:12)
where MARGIN clearly has to contain two dimensions explicitly.
So, I use the examples with a larger margin corresponding to STATS
as follows
  sweep(array(1:24, dim = c(4,3,2)), 1, 1:2)    # no warning
  sweep(array(1:24, dim = c(4,3,2)), 1:2, 1:12) # no warning
  sweep(array(1:24, dim = c(4,3,2)), 1:3, 1:24) # no warning
  sweep(array(1:24, dim = c(4,3,2)), 1:2, 1:3)  # warning
The current proposal for sweep indeed gives no warning in the first
three examples and gives a warning in the last one.

I did not use the suggestion to call the option warn with default
warn = getOption("warn"). The reason is that there are two
different decisions:
 (1) whether to generate a warning
 (2) what to do with the warning, if it is generated.
The warn option influences (2): the warning may be suppressed,
printed after the return to the top level, printed immediately or
it may be converted to an error. I think that the option
controling (2) should not be mixed with an option which
controls (1). If R has an option controling to which extent
recycling is allowed, then this could be used, but warn
has a different purpose.

I appreciate feedback.

Petr.



More information about the R-devel mailing list