[Rd] head.matrix can return 1000s of columns ..

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Thu Nov 28 15:30:51 CET 2019


>>>>> Gabriel Becker 
>>>>>     on Sat, 2 Nov 2019 12:40:16 -0700 writes:

    [....................]

In the mean time,  Gabe had worked quite a bit and provided a
patch proposal  at R's bugzilla,  PR#17652 ,
i.e., here
      https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17652

A few days ago, I had committed a (slightly simplified) version
of that to R-devel (svn rev 77462 )
with NEWS entry

    * head(x, n) and tail() default and other S3 methods notably for
      _vector_ n, e.g. to get a "corner" of a matrix, also extended for
      array's of higher dimension, thanks to the patch proposal by Gabe
      Becker in PR#16764.

 (which contains a *wrong* PR number that I've corrected in the
  mean time)

A day or so later, the CRAN has alerted me to the fact that this
change breaks the checks of some CRAN packages, as it seems
about 30 now.

There were at least two principal reasons, one of which was the
fact that data frame subsetting has been somewhat surprising in R,
without being documented so, *and* some packages have
inadvertently made use of this pecularity -- which was
inadvertently changed by r77462.

In short,   head(<data frame>)  kept extraneous attributes
because indeed
                d[i, ]
keeps those attributes ... for data frames.

I will amend the  head() and tail() methods to remain back
compatible (as much as sensible) for now,  but here's what I've
found about subsetting, i.e., behavior of the (partly C code
internal)  `[`  methods in R :

1)  For a data frame d,  d[i, ]  differs  from  d[i,j],
    as the former keeps (extra) attributes,
2)  For a matrix both forms of indexing do not keep (extra) attributes.

Here's some simple reproducible R code exhibiting the claim:

##==== Data frame subsetting (vs. matrix, array)  "with extra attributes": =====
## data frame w/ a (non-standard) attribute:
str(treeS <- structure(trees, foo = "bar"))

chkMat <- function(M) {
    stopifnot(nzchar(Mfoo <- attr(M, "foo")),
              length(d <- dim(M)) == 2,
              (n <- d[1]) >= 6, d[2] >= 3)
    ## n = nrow(M)
    stopifnot(exprs = { # attribute is kept
        if(inherits(M, "data.frame")) {
            identical(  attr(M[    1:3 , ] , "foo") , "bar") &&
            identical(  attr(M[(n-2):n , ] , "foo") , "bar")
        } else { ## matrix
            is.null  (  attr(M[    1:3 , ] , "foo")) &&
            is.null  (  attr(M[(n-2):n , ] , "foo"))
        }
        ## OTOH,  [i,j]-indexing of data frames *does* drop "other" attributes:
        inherits(print(t.ij <- M[(n-2):n, 2:3] ), class(M))
        ## now, the "foo" attribute of  M[i,j] is gone!
        is.null(attr(t.ij, "foo"))
    })
}

chkMat(treeS)
chkMat(as.matrix(treeS))

-------

And (to repeat), currently  head(d, n)  is the same as   d[1:n , ]
when n >= 1,  length(n) == 1  and this equality is relied upon
by CRAN package code out there .. and hence I'll keep it with
the "generalized" head() & tail() in R-devel.

Martin



More information about the R-devel mailing list