[Rd] sapply improvements

William Dunlap wdunlap at tibco.com
Wed Nov 4 22:04:56 CET 2009


> -----Original Message-----
> From: William Dunlap 
> Sent: Wednesday, November 04, 2009 12:53 PM
> To: 'Duncan Murdoch'
> Cc: r-devel at r-project.org
> Subject: RE: sapply improvements
> 
> It looks good on following examples:
> 
> > z <- split(log(1:10), rep(letters[1:2],c(3,7)))
> > sapply(z, length, FUN.VALUE=numeric(1))
> Error in sapply(z, length, FUN.VALUE = numeric(1)) : 
>   FUN values must be of type 'double'
> 
> (I'd like the error to say "... must be of type 'double',
> not 'integer'", to give the user a fuller diagnosis of
> the problem.)

If this new argument gets used much it may give a
push towards getting functions to always return the
same type of output.  E.g., range(integer(0)) returns
a numeric while range(integer(1)) an integer, resulting
in:
   > z<-split(1:10, cut(log(1:10),breaks=0:4,include.lowest=TRUE))
   > # z[[4]] is integer(0)
   > sapply(z,range,FUN.VALUE=integer(2))
   Error in sapply(z, range, FUN.VALUE = integer(2)) : 
     FUN values must be of type 'integer'
   In addition: Warning messages:
   1: In min(x) : no non-missing arguments to min; returning Inf
   2: In max(x) : no non-missing arguments to max; returning -Inf

> 
> > sapply(z, range, FUN.VALUE=c(Min=0,Max=0))
>            a        b
> Min 0.000000 1.386294
> Max 1.098612 2.302585
> 
> Exactly matching the typeof's and using the names
> for row.names on matrix output seem good to me.
>  
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com  
> 
> > -----Original Message-----
> > From: Duncan Murdoch [mailto:murdoch at stats.uwo.ca] 
> > Sent: Wednesday, November 04, 2009 12:24 PM
> > To: William Dunlap
> > Cc: michael.m.spiegel at gmail.com; r-devel at stat.math.ethz.ch
> > Subject: sapply improvements
> > 
> > On 11/4/2009 12:15 PM, William Dunlap wrote:
> > >> -----Original Message-----
> > >> From: r-devel-bounces at r-project.org 
> > >> [mailto:r-devel-bounces at r-project.org] On Behalf Of 
> Duncan Murdoch
> > >> Sent: Wednesday, November 04, 2009 8:47 AM
> > >> To: michael.m.spiegel at gmail.com
> > >> Cc: R-bugs at r-project.org; r-devel at stat.math.ethz.ch
> > >> Subject: Re: [Rd] error in install.packages() (PR#14042)
> > >> 
> ... 
> > >> For future reference:  the problem was that it assigned 
> > the result of 
> > >> sapply() to a subset of a vector.  Normally sapply() 
> > simplifies its 
> > >> result to a vector, but in this case the result was empty, so 
> > >> sapply() 
> > >> returned an empty list; assigning a list to a vector coerced 
> > >> the vector 
> > >> to a list, and then the "invalid subscript type 'list'" came 
> > >> soon after.
> > > 
> > > I've run into this sort of problem a lot (0-long input to sapply
> > > causes it to return list()).  A related problem is that 
> > when sapply's
> > > FUN doesn't always return the type of value you expect for some
> > > corner case then sapply won't do the expected simplication.  If
> > > sapply had an argument that gave the expected form of FUN's output
> > > then sapply could (a) die if some call to FUN didn't return 
> > something
> > > of that form and (b) return a 0-long object of the correct form
> > > if sapply's X has length zero so FUN is never called.  E.g.,
> > >    sapply(2:0, function(i)(11:20)[i], 
> FUN.VALUE=integer(1)) # die on
> > > third iteration
> > >    sapply(integer(0), function(i)i>0, 
> FUN.VALUE=logical(1)) # return
> > > logical(0)
> > > 
> > > Another benefit of sapply knowing the type of FUN's 
> return value is
> > > that it wouldn't have to waste space creating a list of 
> FUN's return
> > > values but could stuff them directly into the final output 
> > structure.
> > > A list of n scalar doubles is 4.5 times bigger than 
> > double(n) and the
> > > factor is 9.0 for integers and logicals.
> > 
> > 
> > What do you think of the behaviour of the sapply function 
> below?  (I 
> > wouldn't put it into R as it is, I'd translate it to C code 
> > to avoid the 
> > lapply call; but I'd like to get the behaviour right before 
> > doing that.)
> > 
> > This one checks that the length() and typeof() results are 
> > consistent. 
> > If the FUN.VALUE has names, those are used (but it doesn't 
> > require the 
> > names from FUN to match).
> ...



More information about the R-devel mailing list