[R] by inconsistently strips class - with fix

Gabor Grothendieck ggrothendieck at gmail.com
Thu Apr 17 13:03:45 CEST 2008


There is a Defaults package on CRAN that allows one
to set default arguments for any function.

On Thu, Apr 17, 2008 at 6:49 AM, Alex Brown <fishtank at compsoc.man.ac.uk> wrote:
> Adding a simplify argument to by would suit me fine.
>
> In my (limited) experience in using R, the automatic simplification
> that R does in various situations is one of it's most troublesome
> features.  It means that I cannot expect a program to work even if I
> give it data of the same types as I always have before; any time a
> dimension is reduced to 1 bad things happen.
>
> Is there a master switch I can set so dropping never happens
> automatically?
>
> Can you please have an option that by reads so I can indicate that by
> should never drop/simplify?
>
> -Alex
>
>
> On 17 Apr 2008, at 07:03, Prof Brian Ripley wrote:
>
> > Unfortunately your proposed change changes the type of the output:
> > simplification is intended in many applications of by().
> >
> > Before:
> >
> >> str(by(mytimes$date[1], mytimes$set[1], function(x)x))
> > by [, 1] 1.21e+09
> > - attr(*, "dimnames")=List of 1
> > ..$ mytimes$set[1]: chr "1"
> > - attr(*, "call")= language by.default(data = mytimes$date[1],
> > INDICES = mytimes$set[1],      FUN = function(x) x)
> >
> > After:
> >
> >> str(by(mytimes$date[1], mytimes$set[1], function(x)x))
> > List of 1
> > $ 1: POSIXct[1:1], format: "2008-04-17 06:53:31"
> > - attr(*, "dim")= int 1
> > - attr(*, "dimnames")=List of 1
> > ..$ mytimes$set[1]: chr "1"
> > - attr(*, "call")= language by.default(data = mytimes$date[1],
> > INDICES = mytimes$set[1],      FUN = function(x) x)
> > - attr(*, "class")= chr "by"
> >
> > c() does not do the same thing as unlist() in general, and it is
> > untrue that 'c does not strip class'.  What happens in your example
> > is that there is a c() method for your class (and not many others).
> >
> > What we could is to add a 'simplify' argument to by() so you can
> > control the simplification.
> >
> >
> > On Tue, 15 Apr 2008, Alex Brown wrote:
> >
> >> summary:
> >>
> >> The function 'by' inconsistently strips class from the data to which
> >> it is applied.
> >>
> >> quick reason:
> >>
> >> tapply strips class when simplify is set to TRUE (the default) due to
> >> the class stripping behaviour of unlist.
> >>
> >> quick answer:
> >>
> >> This can be fixed by invoking tapply with simplify=FALSE, or changing
> >> tapply to use do.call(c instead of unlist
> >>
> >> executable example:
> >>
> >> mytimes=data.frame(date = 1:3 + Sys.time(), set = c(1,1,2))
> >>
> >> by(mytimes$date, mytimes$set, function(x)x)
> >>
> >> INDICES: 1
> >> [1] "2008-04-15 11:41:38 BST" "2008-04-15 11:41:39 BST"
> >> ----------------------------------------------------------------------------------------
> >> INDICES: 2
> >> [1] "2008-04-15 11:41:40 BST"
> >>
> >> by(mytimes[1,]$date, mytimes[1,]$set, function(x)x)
> >>
> >> INDICES: 1
> >> [1] 1208256099
> >>
> >> why this is a problem:
> >>
> >> This is a problem when you are feeding the output of this by into a
> >> function which expects the class to be maintained.  I see this
> >> problem
> >> when constructing
> >>
> >> reason:
> >>
> >> tapply strips class when simplify is set to TRUE (the default) due to
> >> the behaviour of unlist:
> >>
> >> "Where possible the list elements are coerced to a common mode during
> >> the unlisting, and so the result often ends up as a character vector.
> >> Vectors will be coerced to the highest type of the components in the
> >> hierarchy NULL < raw < logical < integer < real < complex < character
> >> < list < expression: pairlists are treated as lists."
> >>
> >> solution:
> >>
> >> This problem can be fixed in the function by.data.frame by modifying
> >> the call to tapply in the function "by":
> >>
> >> by.data.frame = function (data, INDICES, FUN, ...)
> >> {
> >> if (!is.list(INDICES)) {
> >>     IND <- vector("list", 1)
> >>     IND[[1]] <- INDICES
> >>     names(IND) <- deparse(substitute(INDICES))[1]
> >> }
> >> else IND <- INDICES
> >> FUNx <- function(x) FUN(data[x, ], ...)
> >> nd <- nrow(data)
> >> <<<<
> >> ans <- eval(substitute(tapply(1:nd, IND, FUNx)), data)
> >> ====
> >> ans <- eval(substitute(tapply(1:nd, IND, FUNx, simplify=FALSE)),
> >> data)
> >> >>>>
> >> attr(ans, "call") <- match.call()
> >> class(ans) <- "by"
> >> ans
> >> }
> >>
> >> alternative solution:
> >>
> >> the call in tapply to unlist(ans, recursive=F) can be replaced by
> >> do.call(c,ans, recursive=F) to fix this issue, since c does not strip
> >> class.
> >>
> >> However, I haven't taken the time to work out if this will work in
> >> all
> >> cases.
> >>
> >> for example:
> >>
> >> function (X, INDEX, FUN = NULL, ..., simplify = TRUE)
> >> {
> >> FUN <- if (!is.null(FUN))
> >>     match.fun(FUN)
> >> if (!is.list(INDEX))
> >>     INDEX <- list(INDEX)
> >> nI <- length(INDEX)
> >> namelist <- vector("list", nI)
> >> names(namelist) <- names(INDEX)
> >> extent <- integer(nI)
> >> nx <- length(X)
> >> one <- 1L
> >> group <- rep.int(one, nx)
> >> ngroup <- one
> >> for (i in seq.int(INDEX)) {
> >>     index <- as.factor(INDEX[[i]])
> >>     if (length(index) != nx)
> >>         stop("arguments must have same length")
> >>     namelist[[i]] <- levels(index)
> >>     extent[i] <- nlevels(index)
> >>     group <- group + ngroup * (as.integer(index) - one)
> >>     ngroup <- ngroup * nlevels(index)
> >> }
> >> if (is.null(FUN))
> >>     return(group)
> >> ans <- lapply(split(X, group), FUN, ...)
> >> index <- as.integer(names(ans))
> >> if (simplify && all(unlist(lapply(ans, length)) == 1)) {
> >>     ansmat <- array(dim = extent, dimnames = namelist)
> >> <<<<
> >>     ans <- unlist(ans, recursive = FALSE)
> >> ====
> >>      ans <- do.call(c, ans, recursive = FALSE)
> >> >>>>
> >> }
> >> else {
> >>     ansmat <- array(vector("list", prod(extent)), dim = extent,
> >>         dimnames = namelist)
> >> }
> >> if (length(index)) {
> >>     names(ans) <- NULL
> >>     ansmat[index] <- ans
> >> }
> >> ansmat
> >> }
> >>
> >> Alexander Brown
> >> Principal Engineer
> >> Transitive
> >> Maybrook House, 40 Blackfriars Street, Manchester M3 2EG
> >> Phone: +44 (0)161 836 2321    Fax: +44 (0)161 836 2399    Mobile: +44
> >> (0)7980 708 221
> >> www.transitive.com
> >> * The leader in cross-platform virtualization
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> > --
> > Brian D. Ripley,                  ripley at stats.ox.ac.uk
> > Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> > University of Oxford,             Tel:  +44 1865 272861 (self)
> > 1 South Parks Road,                     +44 1865 272866 (PA)
> > Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list