[R] sweep() and recycling

Heather Turner Heather.Turner at warwick.ac.uk
Tue Jun 21 17:33:20 CEST 2005


Agreed my examples may be trivial and I'm sure there are more efficient ways to do the same thing, but I disagree that my examples are against the spirit of sweep (). In the first example a vector is swept out from the rows, with one value for odd rows and one value for even rows. In the second example an array of values is swept out across the third dimension. In the third example an array of values is swept out from the full array.

The first example is a natural use of recycling. E.g. 

sweep(matrix(1:100, 50, 2), 1, c(1, 1000), "+", give.warning = TRUE)

is a quicker way of writing

sweep(matrix(1:100, 50, 2), 1, rep(c(1, 1000), 25), "+", give.warning = TRUE)

but your code would give a warning in the first case, even though the intent and the result are exactly the same as in the second case.

As you say, it is only a warning, that can be ignored. However the warning should at least reflect the warning condition used, i.e. warn that the length of STATS does not equal the extent of MARGIN, rather warning that STATS does not recycle exactly.

Heather

>>> Robin Hankin <r.hankin at noc.soton.ac.uk> 06/21/05 02:47pm >>>
Hi

On Jun 21, 2005, at 02:33 pm, Heather Turner wrote:

> I think the warning condition in Robin's patch is too harsh - the 
> following examples seem reasonable to me, but all produce warnings
>
> sweep(array(1:24, dim = c(4,3,2)), 1, 1:2, give.warning = TRUE)
> sweep(array(1:24, dim = c(4,3,2)), 1, 1:12, give.warning = TRUE)
> sweep(array(1:24, dim = c(4,3,2)), 1, 1:24, give.warning = TRUE)
>


The examples above do give warnings (as intended) but I think all three 
cases above
are inimical to the spirit of sweep(): nothing is being "swept" out.

So a warning is appropriate, IMO.

In any case, one can always suppress (or ignore!) a warning if one knows
what one is doing.  YMMV, but if I wanted to do the above operations I 
would
replace


sweep(array(0, dim = c(4,3,2)), c(1,3), 1:12, "+" , give.warning = 
FALSE)

with

  aperm(array(1:12,c(4,2,3)),c(1,3,2))


best wishes

rksh







> I have written an alternative (given below) which does not give 
> warnings in the above cases, but does warn in the following case
>
>> sweep(array(1:24, dim = c(4,3,2)), 1:2, 1:3)
> , , 1
>
>      [,1] [,2] [,3]
> [1,]    0    3    6
> [2,]    0    3    9
> [3,]    0    6    9
> [4,]    3    6    9
>
> , , 2
>
>      [,1] [,2] [,3]
> [1,]   12   15   18
> [2,]   12   15   21
> [3,]   12   18   21
> [4,]   15   18   21
>
> Warning message:
> STATS does not recycle exactly across MARGIN
>
> The code could be easily modified to warn in other cases, e.g. when 
> length of STATS is a divisor of the corresponding array extent (as in 
> the first example above, with length(STATS) = 2).
>
> The code also includes Gabor's suggestion.
>
> Heather
>
> sweep <- function (x, MARGIN, STATS, FUN = "-", warn = 
> getOption("warn"), ...)
> {
>     FUN <- match.fun(FUN)
>     dims <- dim(x)
>     perm <- c(MARGIN, (1:length(dims))[-MARGIN])
>     if (warn >= 0) {
>         s <- length(STATS)
>         cumDim <- c(1, cumprod(dims[perm]))
>         if (s > max(cumDim))
>             warning("length of STATS greater than length of array",
>                     call. = FALSE)
>         else {
>             upper <- min(ifelse(cumDim > s, cumDim, max(cumDim)))
>             lower <- max(ifelse(cumDim < s, cumDim, min(cumDim)))
>             if (any(upper %% s != 0, s %% lower != 0))
>                 warning("STATS does not recycle exactly across MARGIN",
>                         call. = FALSE)
>         }
>     }
>     FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...)
> }
>
>>>> Gabor Grothendieck <ggrothendieck at gmail.com> 06/21/05 01:25pm >>>
> \
> Perhaps the signature should be:
>
>    sweep(...other args go here..., warn=getOption("warn"))
>
> so that the name and value of the argument are consistent with
> the R warn option.
>
> On 6/21/05, Robin Hankin <r.hankin at noc.soton.ac.uk> wrote:
>>
>> On Jun 20, 2005, at 04:58 pm, Prof Brian Ripley wrote:
>>
>>> The issue here is that the equivalent command array(1:5, c(6,6)) (to
>>> matrix(1:5,6,6)) gives no warning, and sweep uses array().
>>>
>>> I am not sure either should: fractional recycling was normally 
>>> allowed
>>> in S3 (S4 tightened up a bit).
>>>
>>> Perhaps someone who thinks sweep() should warn could contribute a
>>> tested patch?
>>>
>>
>>
>> OK,  modified R code and Rd file below (is this the best way to do
>> this?)
>>
>>
>>
>>
>> "sweep" <-
>>   function (x, MARGIN, STATS, FUN = "-", give.warning = FALSE, ...)
>> {
>>   FUN <- match.fun(FUN)
>>   dims <- dim(x)
>>   if(give.warning & length(STATS)>1 & any(dims[MARGIN] !=
>> dim(as.array(STATS)))){
>>     warning("array extents do not recycle exactly")
>>   }
>>   perm <- c(MARGIN, (1:length(dims))[-MARGIN])
>>   FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...)
>> }
>>
>>
>>
>>
>>
>>
>>
>> \name{sweep}
>> \alias{sweep}
>> \title{Sweep out Array Summaries}
>> \description{
>>   Return an array obtained from an input array by sweeping out a 
>> summary
>>   statistic.
>> }
>> \usage{
>> sweep(x, MARGIN, STATS, FUN="-", give.warning = FALSE, \dots)
>> }
>> \arguments{
>>   \item{x}{an array.}
>>   \item{MARGIN}{a vector of indices giving the extents of \code{x}
>>     which correspond to \code{STATS}.}
>>   \item{STATS}{the summary statistic which is to be swept out.}
>>   \item{FUN}{the function to be used to carry out the sweep.  In the
>>     case of binary operators such as \code{"/"} etc., the function 
>> name
>>     must be quoted.}
>>   \item{give.warning}{Boolean, with default \code{FALSE} meaning to
>>   give no warning, even if array extents do not match.  If
>>   \code{TRUE}, check for the correct dimensions and if a
>>   mismatch is detected, give a suitable warning.}
>>   \item{\dots}{optional arguments to \code{FUN}.}
>> }
>> \value{
>>   An array with the same shape as \code{x}, but with the summary
>>   statistics swept out.
>> }
>> \note{
>>   If \code{STATS} is of length 1, recycling is carried out with no
>>   warning irrespective of the value of \code{give.warning}.
>> }
>>
>> \references{
>>   Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
>>   \emph{The New S Language}.
>>   Wadsworth \& Brooks/Cole.
>> }
>> \seealso{
>>   \code{\link{apply}} on which \code{sweep} used to be based;
>>   \code{\link{scale}} for centering and scaling.
>> }
>> \examples{
>> require(stats) # for median
>> med.att <- apply(attitude, 2, median)
>> sweep(data.matrix(attitude), 2, med.att)# subtract the column medians
>>
>> a <- array(0, c(2, 3, 4))
>> b <- matrix(1:8, c(2, 4))
>> sweep(a, c(1, 3), b, "+", give.warning = TRUE)  # no warning:
>> all(dim(a)[c(1,3)] == dim(b))
>> sweep(a, c(1, 2), b, "+", give.warning = TRUE)  # warning given
>>
>> }
>> \keyword{array}
>> \keyword{iteration}
>>
>>
>>
>>
>> --
>> Robin Hankin
>> Uncertainty Analyst
>> National Oceanography Centre, Southampton
>> European Way, Southampton SO14 3ZH, UK
>>  tel  023-8059-7743
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help 
>> PLEASE do read the posting guide! 
>> http://www.R-project.org/posting-guide.html 
>>
>
>
--
Robin Hankin
Uncertainty Analyst
National Oceanography Centre, Southampton
European Way, Southampton SO14 3ZH, UK
  tel  023-8059-7743




More information about the R-help mailing list