[Rd] sweep sanity checking?

Petr Savicky savicky at cs.cas.cz
Wed Jul 25 09:03:21 CEST 2007


I would like to suggest a patch against R-devel-2007-07-24, which
modifies function sweep by including a warning, if dim(STATS) is not
consistent with dim(x)[MARGIN]. If check.margin=FALSE, the simple test whether
prod(dim(x)[MARGIN]) is a multiple of length(STATS) is performed.
If check.margin=TRUE, then a more restrictive test is used, but a limited
recycling is still allowed without warning. Besides generating a warning
in some situations, there is no other change in the behavior of sweep.
The patch is:

--- R-devel_2007-07-24/src/library/base/R/sweep.R	2007-01-04 17:51:53.000000000 +0100
+++ R-devel_2007-07-24-sweep/src/library/base/R/sweep.R	2007-07-24 10:56:18.000000000 +0200
@@ -1,7 +1,21 @@
-sweep <- function(x, MARGIN, STATS, FUN = "-", ...)
+sweep <- function(x, MARGIN, STATS, FUN = "-", check.margin=FALSE, ...)
 {
     FUN <- match.fun(FUN)
     dims <- dim(x)
+    if (check.margin) {
+        dimmargin <- dims[MARGIN]
+        dimmargin <- dimmargin[dimmargin > 1]
+        dimstats <- if (is.null(dim(STATS))) length(STATS) else dim(STATS)
+        dimstats <- dimstats[dimstats > 1]
+        if (length(dimstats) > length(dimmargin) ||
+            any(dimstats != dimmargin[seq(along.with=dimstats)]))
+            warning("length(STATS) or dim(STATS) do not match dim(x)[MARGIN]")
+    } else if (prod(dims[MARGIN]) %% length(STATS) != 0) {
+        if (length(MARGIN) == 1)
+            warning("dim(x)[MARGIN] is not a multiple of length(STATS)")
+        else
+            warning("prod(dim(x)[MARGIN]) is not a multiple of length(STATS)")
+    }
     perm <- c(MARGIN, (1:length(dims))[ - MARGIN])
     FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...)
 }

The patch uses the default check.margin=FALSE, since this is more backward
compatible. Changing the default to check.margin=TRUE would also be fine
with me and also with Ben Bolker, who told me this in a separate email.

Let me include more comments on the stricter test. If check.margin=TRUE,
then the patch tests whether (after deleting possible dimensions with
only one level) dim(STATS) is a prefix of dim(x)[MARGIN]. Hence, for example,
if dim(x)[MARGIN] = c(k1,k2), the cases
  length(STATS) = 1,
  dim(STATS) = k1,
  dim(STATS) = NULL and length(STATS) = k1,
  dim(STATS) = c(k1,k2)
are accepted without warning. On the other hand, if k1 != k2, then,
for example, dim(STATS)= k2, dim(STATS) = c(k2,k1) generate
a warning, although the simple divisibility condition
   length(STATS) divides prod(dim(x)[MARGIN])
is satisfied. The warning is generated, since in the last two cases,
recycling produces incorrect or at least suspicious result.

In the simplest case, when length(MARGIN)=1 and STATS is a vector,
the cases accepted by the stricter test without warning are exactly the
following two: length(STATS) = 1, length(STATS) = dim(x)[MARGIN].

I tested the patch using the script
  http://www.cs.cas.cz/~savicky/R-devel/verify_sweep1.R
Ben Bolker also tested the patch in his environment.

I appreciate to know the opinion of R core developers on this patch.
Thank you in advance.

Petr.



More information about the R-devel mailing list