[R] Having trouble understanding the sapply/vapply documentation and behaviour of USE.NAMES

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Sat Apr 11 20:09:07 CEST 2020

>>>>> Rolf Turner 
>>>>>     on Fri, 10 Apr 2020 12:23:49 +1200 writes:

    > On 10/04/20 10:59 am, petr smirnov wrote:

    >> I am having trouble parsing the documentation for sapply
    >> and vapply, and I cannot understand if it explains the
    >> different behaviour of USE.NAMES between the two.
    >> I noticed the following different behaviour between the
    >> two functions:
    >>> sapply(c("1"=1, "2"=2, "3"=3), function(x) {r <-
    >>> list(x); r}, USE.NAMES=FALSE)
    >> $`1` [1] 1
    >> $`2` [1] 2
    >> $`3` [1] 3
    >>> vapply(c("1"=1, "2"=2, "3"=3), function(x) { r <-
    >>> list(x); r}, FUN.VALUE=list(1), USE.NAMES=FALSE)
    >> [[1]] [1] 1
    >> [[2]] [1] 2
    >> [[3]] [1] 3
    >> In the sapply case, the names of the input vector are
    >> retained. In the vapply case, they are dropped. Note that
    >> this is not true when USE.NAMES=TRUE:
    >>> vapply(c("1"=1, "2"=2, "3"=3), function(x) { r <-
    >>> list(x); r}, FUN.VALUE=list(1), USE.NAMES=TRUE)
    >> $`1` [1] 1
    >> $`2` [1] 2
    >> $`3` [1] 3
    >> The manual page explains this for the names of the result
    >> of vapply:
    >> The (Dim)names of the array value are taken from the
    >> FUN.VALUE if it is named, otherwise from the result of
    >> the first function call. Column names of the matrix or
    >> more generally the names of the last dimension of the
    >> array value or names of the vector value are set from X
    >> as in sapply.
    >> If this explains the behaviour, could someone break it
    >> down for me and help me understand the reasoning?

1) sapply() exists longer than R  with the current semantic:

  If the original list (or named vector or..) already *has* 'names',
  they are not explicitly removed, and so  USE.NAMES has no
  effect for sapply().

  This is "logical" if you think what sapply() had been created
  for:  sapply(..) := "simplified lapply(..)"
  and then it got the *new* option to *add* names when there
  were none in the original lapply() result:

 The very first paragraph on the help page where 'sapply' is
 mentioned reads (with line breaks modified to be even clearer here) :

    'sapply' is a user-friendly version and wrapper of 'lapply' by
    default returning a vector, matrix or, if 'simplify = "array"', an
    array if appropriate, by applying 'simplify2array()'.  
    'sapply(x, f, simplify = FALSE, USE.NAMES = FALSE)'   is the same as
    'lapply(x, f)'.

 From this alone (and some thinking about reasonable lapply() semantics)
 it's easy to conclude that indeed  sapply(*, USE.NAMES=FALSE)
 should *not* remove names that were there already after lapply().

 {One possibility would be to allow a new *third* option for
  USE.NAMES, say  USE.NAMES="never" .. but I'd tend to consider
  that superfluous and hence just an unnecessary complication}

2) vapply()  had been proposed and introduced about two
   decennia after sapply() [had been introduced into S, R's precursor].

   For that, it's implementor must have chosen to use
   'USE.NAMES' more intuitively... and in that case, it also
   makes a *lot* of sense implementation wise, because vapply()
   does *not* go via lapply().

    >> Otherwise, is this different behaviour intentional?

yes, see above.

    >> Should it be documented more clearly?

If both you and Rolf find the current help page confusing about
this, I'd  agree that it probably could be improved.
Improvement is not easy so:  As we observe help pages are very
very rarely read carefully nowadays (not even by you in this
case (!), see below),  so it's not entirely obvious if adding
verbosity to an existing help page is any improvement.

    > IMHO there is an error in the documentation here.  Clearly
    > USE.NAMES has a different impact on vapply() than it has
    > on sapply() and the documentation does not indicate this,
    > in fact quite the opposite.

Strictly speaking, that is not true; there is *NO* error:
The documentation never explicitly says what happens with USE.NAMES=FALSE,
whereas it clearly and correctly specifies the effect of  USE.NAMES=TRUE,
and everything it says is correct (and as mentioned initially,
it notably indirectly specifies that sapply(*, USE.NAMES=FALSE)
would *not* remove existing names.


    > cheers,
    > Rolf Turner
    > -- 
    > Honorary Research Fellow Department of Statistics
    > University of Auckland Phone: +64-9-373-7599 ext. 88276

More information about the R-help mailing list