[Rd] sweep sanity checking?

Petr Savicky savicky at cs.cas.cz
Fri Jul 13 22:37:36 CEST 2007


I would like to suggest a replacement for the curent function
sweep based on the two previous suggestions posted at
  https://stat.ethz.ch/pipermail/r-help/2005-June/073989.html
and
  http://wiki.r-project.org/rwiki/doku.php?id=rdoc:base:sweep
with some extensions.

My suggestion is to use one of the following two variants. They
differ in whether length(STATS) == 1 is allowed without a warning
in the stricter (default) check or not. I don't know, what is better.

  sweep1 <- function (x, MARGIN, STATS, FUN = "-", check.margin=TRUE, ...)
  {
      FUN <- match.fun(FUN)
      dims <- dim(x)
      dimstat <- if (is.null(dim(STATS))) length(STATS) else dim(STATS)
      if (length(MARGIN) < length(dimstat)) {
          STATS <- drop(STATS)
          dimstat <- if (is.null(dim(STATS))) length(STATS) else dim(STATS)
      }
      if (check.margin && length(STATS) > 1 &&
          (length(dimstat)!=length(MARGIN) || any(dims[MARGIN]!=dimstat))) {
          warning("length(STATS) or dim(STAT) do not match dim(x)[MARGIN]")
      } else if (prod(dims[MARGIN]) %% length(STATS)!=0)
          warning("prod(dim(x)[MARGIN]) is not a multiple of length(STATS)")
      perm <- c(MARGIN, (1:length(dims))[-MARGIN])
      FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...)
  }

  sweep2 <- function (x, MARGIN, STATS, FUN = "-", check.margin=TRUE, ...)
  {
      FUN <- match.fun(FUN)
      dims <- dim(x)
      dimstat <- if (is.null(dim(STATS))) length(STATS) else dim(STATS)
      if (length(MARGIN) < length(dimstat)) {
          STATS <- drop(STATS)
          dimstat <- if (is.null(dim(STATS))) length(STATS) else dim(STATS)
      }
      if (check.margin &&
          (length(dimstat)!=length(MARGIN) || any(dims[MARGIN]!=dimstat))) {
          warning("length(STATS) or dim(STAT) do not match dim(x)[MARGIN]")
      } else if (prod(dims[MARGIN]) %% length(STATS)!=0)
          warning("prod(dim(x)[MARGIN]) is not a multiple of length(STATS)")
      perm <- c(MARGIN, (1:length(dims))[-MARGIN])
      FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...)
  }

The functions are tested on the following examples.

  a <- array(1:64,dim=c(4,4,4))
  M <- c(1,3);

  b          sweep1(a,M,b) sweep2(a,M,b) sweep1(a,M,b,c=F) sweep2(a,M,b,c=F)

  a[,2,]     -             -             -                 - 
  a[1:2,2,]  warning       warning       -                 - 
  1          -             warning       -                 -
  1:3        warning       warning       warning           warning
  1:16       warning       warning       -                 -

  a <- matrix(1:8,nrow=4,ncol=2);
  M <- 1;

  b          sweep1(a,M,b) sweep2(a,M,b) sweep1(a,M,b,c=F) sweep2(a,M,b,c=F)

  1          -             warning       -                 -
  1:2        warning       warning       -                 -
  1:3        warning       warning       warning           warning
  1:4        -             -             -                 -
  matrix(1:4,nrow=1) # (A)
             -             -             -                 -
  matrix(1:4,ncol=1) # (B)
             -             -             -                 -

  a <- matrix(1:8,nrow=4,ncol=2);
  M <- 2;

  b          sweep1(a,M,b) sweep2(a,M,b) sweep1(a,M,b,c=F) sweep2(a,M,b,c=F)

  1          -             warning       -                 -
  1:2        -             -             -                 -
  1:3        warning       warning       warning           warning
  1:4        warning       warning       warning           warning
  matrix(1:2,ncol=1) # (A)
             -             -             -                 -
  matrix(1:2,nrow=1) # (B)
             -             -             -                 -

Note that cases marked (A) do not generate a warning, although they
should. This is the cost of using drop(STATS), which allows cases
marked (B) without a warning. In my opinion, cases (B) make sense.

Reproducing the tests is possible using the script
  http://www.cs.cas.cz/~savicky/R-devel/verify_sweep.R

Petr.



More information about the R-devel mailing list