[R] Infelicity in print output with matrix indexing of `[.data.frame`

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Sun Dec 18 19:51:29 CET 2016


Ah, "why"... perhaps because the speed reduction involved in successive indexing operations on data frames was considered unacceptable to the programmer? (Also the code would essentially have to check for type conversion of the result vector as every row of the index matrix was retrieved.) Perhaps for backward compatibility?

You could code your own version that behaved the way you like, but I think the usual expectation is that indexing should be faster than an R for loop, so hiding such behavior behind [.data.frame seems a bit deceptive to me. 

It seems much more straightforward to me to explicitly convert that portion of the data frame that you intend to do matrix indexing with into a matrix of known type for the purposes of this task, rather than expecting [.data.frame to figure out that you don't plan to retrieve values from the non-numeric columns of the data frame. (Sometimes the fact that things are hard is a hint that you should re-think your solution.)
-- 
Sent from my phone. Please excuse my brevity.

On December 18, 2016 10:00:45 AM PST, David Winsemius <dwinsemius at comcast.net> wrote:
>
>> On Dec 17, 2016, at 3:15 PM, Jeff Newmiller
><jdnewmil at dcn.davis.ca.us> wrote:
>> 
>> No, cannot agree. The result of using an n by 2 matrix to index into
>a rectangular object is a vector. A vector can only have one storage
>mode for all elements. Some type coercion is necessary to accommodate
>this.
>
>I have no argument with the premise that an atomic vector must be of a
>single mode.  But the exact same values were established with a numeric
>vector into those positions indexed by the 2-column matrix. Why does
>extraction need to coerce the entire dataframe to matrix when none of
>the extracted values are character? I suppose my request is that the
>very simple line in `[.data.frame`
>
>
>    if (is.matrix(i)) 
>            return(as.matrix(x)[i])
>
>If it were replaced by code that would only extract from the values
>needed and then use a shifted version of the selection matrix, you
>could get values that were not coerced by being innocent bystanders of
>a dataframe colum that was not relevant.
>
>as.matrix( x[ min( i[ , 1]):max( i[ , 1]), min( i[ ,2 ]):max(i[ , 2])
>])[
>              cbind( i[,1]-min( i[ , 1]) +1, i[,2]- min( i[ ,2 ]) +1) ]



More information about the R-help mailing list