[Rd] (PR#8192) [ subscripting sometimes loses names

Wacek Kusnierczyk Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Thu Feb 5 13:17:17 CET 2009


it's becoming an old story, but here's a bit to be added.

Peter Dalgaard wrote:
> Duncan Murdoch wrote:
>> On 31/01/2009 7:31 AM, Andrew Piskorski wrote:
>>> This (tangential) discussion really should be a separate thread so I
>>> changed the subject line above.
>>>
>>> On Fri, Jan 30, 2009 at 11:51:00AM -0500, Simon Urbanek wrote:
>>>> Subject: Re: [Rd] (PR#13487) Segfault when mistakenly calling
>>>> [.data.frame
>>>
>>>>> My boss was debugging an issue in our R code.  We have our own
>>>>> "[...."
>>>>> functions, because stock R drops names when subscripting.
>>>> ... if you tell it to do so, yes. If you tell it to not do that,
>>>> it  won't ... ever tried drop=FALSE ?
>>>
>>> Simon, no, the drop=FALSE argument has nothing to do with what
>>> Christian was talking about.  The kind of thing he meant is PR# 8192,
>>> "Subject: [ subscripting sometimes loses names":
>>>
>>>   http://bugs.r-project.org/cgi-bin/R/wishlist?id=8192
>>
>> In that bug report you were asked to provide simple examples, and you
>> didn't.  I imagine that's why there was no action on it.  It is not
>> that easy for someone else to actually find the simple example that
>> led you to print
>>
>>      $vec.1
>> BAD  $vec.1[[1]]           $vec.1[[2]]
>>         a    c <NA>         a  c no
>>         1    3   NA         1  3 NA
>>
>> I just tracked this one down, and can put together this simple example:
>>
>>  > (1:3)["no"]
>> [1] NA
>>
>> where I think you would want the name "no" attached to the output. 
>> (Or maybe your more complicated example is wanted?  You don't
>> explain.)  But that looks like documented behaviour to me:  according
>> to my reading of "Indexing by vectors" in the R Language Definition
>> manual, it should give the same answer as (1:3)[4], and it does.  So
>> it's not a bug, but a wishlist item.
>>
>> And the other two cases where you list "BAD" behaviour?  I didn't
>> track them down.
>
> I did, and they boil down to variations of
>
> > data.frame(val=1:3,row.names=letters[1:3])[,1]
> [1] 1 2 3
>
> but it's not obvious that the result should be named using the
> row.names and (in particular) whether or why it should differ from
> .....[[1]] and ....$val. 

once you are saying that, be prepared to explain why it should *not*
differ from [[1]] and $val.  reading ?'[' carefully, you'll find:

"     The most important distinction between '[', '[[' and '$' is that
     the '[' can select more than one element whereas the other two
     select a single element."

that's actually quite enough to justify why [,1] (or rather [, indices],
with an arbitrary vector of indices) should differ from [[1]] and $val. 
precisely because:

a) [[index]] and $name are *guaranteed* to return one column (or fail),
so it's reasonable to *always* drop the dimension -- because it will be
done in the case of every successful selection;

b) [, indices] *may* or *may not* return one column in a successful
selection, and now dropping the dimension (and names) depends not on the
type of the indices used (positive numeric, negative numeric, character,
whatever), but on the length of the index vector.

why is external consistence of [ (being like [[ and $) when a single
index is used more important than its internal consistence (returning
the same type of data -- a data frame, or a like-dimensioned matrix --
irrespectively of the length of the index vector)?

i realize that the issue of drop=FALSE vs drop=TRUE as the default has
been discussed before, but i don't find clear arguments given for the
first option, beyond that it just is so and would break much old code if
were to be changed.  i'm actually hoping not for this to be changed, but
for users not to be blamed for assuming [,1] returns a data frame with
row names.  it's *not* their fault they are wrong.


> Given that for most purposes, extracting the relevant names would just
> be unnecessary red tape, I'd say that we can do without it.
>

would keeping the dimensions and class be just unnecessary red tape,
too?  can you know what most users' purposes are?

vQ



More information about the R-devel mailing list