[Rd] partial matching of row names in [-indexing

Henrik Bengtsson henr|k@bengt@@on @end|ng |rom gm@||@com
Thu Jan 20 20:58:50 CET 2022


Although implicit, but what I don't think anyone has mentioned is that
the partial matching of row names only applies if the row name is
uniquely matched, as in:

> X <- data.frame(a=1:3, b=letters[1:3], row.names=c("A1", "B", "C"))
> X["A", ]
   a b
A1 1 a

If it matches two or more rows, you get:

> X <- data.frame(a=1:3, b=letters[1:3], row.names=c("A1", "A2", "C"))
> X["A", ]
    a    b
NA NA <NA>

just as you would get if there is no match:

> X["A3", ]
    a    b
NA NA <NA>

So, the current behavior is dependent on what the other similar row
names too, that is, what might work at one point, might break when new
data are added to the data frame.

This is a behavior that I think stems from someone thought it's handy
while working interactively with data.frame:s interactively.  I think
it's an error-prone property when it comes to production code (script,
packages, and dynamic documents).  To me, this behavior should be
phased out from R to avoid silent errors and false scientific results.
It's not clear to me how to best deprecate the partial matching,
because of the default behavior of returning NA:s when there is no
match.  This means it can't be just a warning or an error.

My $.03

/Henrik

On Fri, Jan 14, 2022 at 6:55 PM Ben Bolker <bbolker using gmail.com> wrote:
>
>    Makes sense if you realize that ?"[" only applies to *vector*,
> *list*, and *matrix* indexing and that data frames follow their own
> rules that are documented elsewhere ...
>
>    So yes, not a bug but I claim it's an infelicity. I might submit a
> doc patch.
>
>   FWIW
>
> b["A1",]
> as.matrix(b)["A1",]
>
>   illustrates the difference.
>
>   thanks
>     Ben
>
>
> On 1/14/22 9:19 PM, Steve Martin wrote:
> > I don't think this is a bug in the documentation. The help page for
> > `?[.data.frame` has the following in the last paragraph of the
> > details:
> >
> > Both [ and [[ extraction methods partially match row names. By default
> > neither partially match column names, but [[ will if exact = FALSE
> > (and with a warning if exact = NA). If you want to exact matching on
> > row names use match, as in the examples.
> >
> > The example it refers to is
> >
> > sw <- swiss[1:5, 1:4]  # select a manageable subset
> > sw["C", ] # partially matches
> > sw[match("C", row.names(sw)), ] # no exact match
> >
> > Whether this is good behaviour or not is a different question, but the
> > documentation seems clear enough (to me, at least).
> >
> > Best,
> > Steve
> >
> > On Fri, 14 Jan 2022 at 20:40, Ben Bolker <bbolker using gmail.com> wrote:
> >>
> >>
> >>     People are often surprised that row-indexing a data frame by [ +
> >> character does partial matching (and annoyed that there is no way to
> >> turn it off:
> >>
> >> https://stackoverflow.com/questions/18033501/warning-when-partial-matching-rownames
> >>
> >> https://stackoverflow.com/questions/34233235/r-returning-partial-matching-of-row-names
> >>
> >> https://stackoverflow.com/questions/70716905/why-does-r-have-inconsistent-behaviors-when-a-non-existent-rowname-is-retrieved
> >>
> >>
> >> ?"[" says:
> >>
> >> Character indices can in some circumstances be partially matched
> >>        (see ‘pmatch’) to the names or dimnames of the object being
> >>        subsetted (but never for subassignment).  UNLIKE S (Becker et al_
> >>        p. 358), R NEVER USES PARTIAL MATCHING WHEN EXTRACTING BY ‘[’, and
> >>        partial matching is not by default used by ‘[[’ (see argument
> >>        ‘exact’).
> >>
> >> (EMPHASIS ADDED).
> >>
> >> Looking through the rest of that page, I don't see any other text that
> >> modifies or supersedes that statement.
> >>
> >>     Is this a documentation bug?
> >>
> >> The example given in one of the links above:
> >>
> >> b <- as.data.frame(matrix(4:5, ncol = 1, nrow = 2, dimnames =
> >> list(c("A10", "B"), "V1")))
> >>
> >> b["A1",]  ## 4 (partial matching)
> >> b[rownames(b) == "A1",]  ## logical(0)
> >> b["A1", , exact=TRUE]    ## unused argument error
> >> b$V1[["A1"]] ## subscript out of bounds error
> >> b$V1["A1"]   ## NA
> >>
> >> ______________________________________________
> >> R-devel using r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> --
> Dr. Benjamin Bolker
> Professor, Mathematics & Statistics and Biology, McMaster University
> Director, School of Computational Science and Engineering
> (Acting) Graduate chair, Mathematics & Statistics
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list