[Rd] partial matching of row names in [-indexing

Ben Bolker bbo|ker @end|ng |rom gm@||@com
Thu Jan 20 21:02:37 CET 2022


   FWIW there is also a discussion of this on bugzilla:

https://bugs.r-project.org/show_bug.cgi?id=18278

On 1/20/22 2:58 PM, Henrik Bengtsson wrote:
> Although implicit, but what I don't think anyone has mentioned is that
> the partial matching of row names only applies if the row name is
> uniquely matched, as in:
> 
>> X <- data.frame(a=1:3, b=letters[1:3], row.names=c("A1", "B", "C"))
>> X["A", ]
>     a b
> A1 1 a
> 
> If it matches two or more rows, you get:
> 
>> X <- data.frame(a=1:3, b=letters[1:3], row.names=c("A1", "A2", "C"))
>> X["A", ]
>      a    b
> NA NA <NA>
> 
> just as you would get if there is no match:
> 
>> X["A3", ]
>      a    b
> NA NA <NA>
> 
> So, the current behavior is dependent on what the other similar row
> names too, that is, what might work at one point, might break when new
> data are added to the data frame.
> 
> This is a behavior that I think stems from someone thought it's handy
> while working interactively with data.frame:s interactively.  I think
> it's an error-prone property when it comes to production code (script,
> packages, and dynamic documents).  To me, this behavior should be
> phased out from R to avoid silent errors and false scientific results.
> It's not clear to me how to best deprecate the partial matching,
> because of the default behavior of returning NA:s when there is no
> match.  This means it can't be just a warning or an error.
> 
> My $.03
> 
> /Henrik
> 
> On Fri, Jan 14, 2022 at 6:55 PM Ben Bolker <bbolker using gmail.com> wrote:
>>
>>     Makes sense if you realize that ?"[" only applies to *vector*,
>> *list*, and *matrix* indexing and that data frames follow their own
>> rules that are documented elsewhere ...
>>
>>     So yes, not a bug but I claim it's an infelicity. I might submit a
>> doc patch.
>>
>>    FWIW
>>
>> b["A1",]
>> as.matrix(b)["A1",]
>>
>>    illustrates the difference.
>>
>>    thanks
>>      Ben
>>
>>
>> On 1/14/22 9:19 PM, Steve Martin wrote:
>>> I don't think this is a bug in the documentation. The help page for
>>> `?[.data.frame` has the following in the last paragraph of the
>>> details:
>>>
>>> Both [ and [[ extraction methods partially match row names. By default
>>> neither partially match column names, but [[ will if exact = FALSE
>>> (and with a warning if exact = NA). If you want to exact matching on
>>> row names use match, as in the examples.
>>>
>>> The example it refers to is
>>>
>>> sw <- swiss[1:5, 1:4]  # select a manageable subset
>>> sw["C", ] # partially matches
>>> sw[match("C", row.names(sw)), ] # no exact match
>>>
>>> Whether this is good behaviour or not is a different question, but the
>>> documentation seems clear enough (to me, at least).
>>>
>>> Best,
>>> Steve
>>>
>>> On Fri, 14 Jan 2022 at 20:40, Ben Bolker <bbolker using gmail.com> wrote:
>>>>
>>>>
>>>>      People are often surprised that row-indexing a data frame by [ +
>>>> character does partial matching (and annoyed that there is no way to
>>>> turn it off:
>>>>
>>>> https://stackoverflow.com/questions/18033501/warning-when-partial-matching-rownames
>>>>
>>>> https://stackoverflow.com/questions/34233235/r-returning-partial-matching-of-row-names
>>>>
>>>> https://stackoverflow.com/questions/70716905/why-does-r-have-inconsistent-behaviors-when-a-non-existent-rowname-is-retrieved
>>>>
>>>>
>>>> ?"[" says:
>>>>
>>>> Character indices can in some circumstances be partially matched
>>>>         (see ‘pmatch’) to the names or dimnames of the object being
>>>>         subsetted (but never for subassignment).  UNLIKE S (Becker et al_
>>>>         p. 358), R NEVER USES PARTIAL MATCHING WHEN EXTRACTING BY ‘[’, and
>>>>         partial matching is not by default used by ‘[[’ (see argument
>>>>         ‘exact’).
>>>>
>>>> (EMPHASIS ADDED).
>>>>
>>>> Looking through the rest of that page, I don't see any other text that
>>>> modifies or supersedes that statement.
>>>>
>>>>      Is this a documentation bug?
>>>>
>>>> The example given in one of the links above:
>>>>
>>>> b <- as.data.frame(matrix(4:5, ncol = 1, nrow = 2, dimnames =
>>>> list(c("A10", "B"), "V1")))
>>>>
>>>> b["A1",]  ## 4 (partial matching)
>>>> b[rownames(b) == "A1",]  ## logical(0)
>>>> b["A1", , exact=TRUE]    ## unused argument error
>>>> b$V1[["A1"]] ## subscript out of bounds error
>>>> b$V1["A1"]   ## NA
>>>>
>>>> ______________________________________________
>>>> R-devel using r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>> --
>> Dr. Benjamin Bolker
>> Professor, Mathematics & Statistics and Biology, McMaster University
>> Director, School of Computational Science and Engineering
>> (Acting) Graduate chair, Mathematics & Statistics
>>
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathematics & Statistics



More information about the R-devel mailing list