[R] The end of Matlab

Duncan Murdoch murdoch at stats.uwo.ca
Fri Dec 12 18:45:29 CET 2008


On 12/12/2008 12:23 PM, hadley wickham wrote:
> On Fri, Dec 12, 2008 at 11:18 AM, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
>> On 12/12/2008 11:38 AM, hadley wickham wrote:
>>>
>>> On Fri, Dec 12, 2008 at 8:41 AM, Duncan Murdoch <murdoch at stats.uwo.ca>
>>> wrote:
>>>>
>>>> On 12/12/2008 8:25 AM, hadley wickham wrote:
>>>>>>
>>>>>> From which you might conclude that I don't like the design of subset,
>>>>>> and
>>>>>> you'd be right.  However, I don't think this is a counterexample to my
>>>>>> general rule.  In the subset function, the select argument is treated
>>>>>> as
>>>>>> an
>>>>>> unevaluated expression, and then there are rules about what to do with
>>>>>> it.
>>>>>>  (I.e. try to look up name `a` in the data frame, if that fails, ...)
>>>>>>
>>>>>> For the requested behaviour to similarly fall within the general rule,
>>>>>> we'd
>>>>>> have to treat all indices to all kinds of things (vectors, matrices,
>>>>>> dataframes, etc.) as unevaluated expressions, with special handling for
>>>>>> the
>>>>>> particular symbol `end`.
>>>>>
>>>>> Except you wouldn't have to necessarily change indexing - you could
>>>>> change seq instead.  Then 5:end could produce some kind of special
>>>>> data structure (maybe an iterator) that was recognised by the various
>>>>> indexing functions.
>>>>
>>>> Ummm, doesn't that require changes to *both* indexing and seq?
>>>
>>> Ooops, yes.  I meant it wouldn't require indexing to use unevaluated
>>> expression.
>>>
>>>>> This would still be a lot of work for not a lot
>>>>> of payoff, but it would be a logically consistent way of adding this
>>>>> behaviour to indexing, and the basic work would make it possible to
>>>>> develop other sorts of indexing, eg df[evens(), ], or df[last(5),
>>>>> last(3)].
>>>>
>>>> I agree:  it would be a nice addition, but a fair bit of work.  I think
>>>> it
>>>> would be quite doable for the indexable things in the base packages, but
>>>> there are a lot of contributed packages that define [ methods, and those
>>>> methods would all need to be modified too.
>>>
>>> That's true, although I suspect many contributed [.methods eventually
>>> delegate to base methods and might work without further modification.
>>>
>>>> (Just to be clear, when I say doable, I'm thinking that your iterators
>>>> return functions that compute subsets of index ranges.  For example,
>>>> evens()
>>>> might be implemented as
>>>>
>>>> evens <- function() {
>>>>  result <- function(indices) {
>>>>   indices[indices %% 2 == 0]
>>>>  }
>>>>  class(result) <- "iterator"
>>>>  return(result)
>>>> }
>>>>
>>>> and then `[` in v[evens()] would recognize that it had been passed an
>>>> iterator, and would pass 1:length(v) to the iterator to get the subset of
>>>> even indices.  Is that what you had in mind?)
>>>
>>> Yes, that's exactly what I was thinking, although you'd have to put
>>> some thought into the conventions - would it be better to pass in the
>>> length of the vector instead of a vector of indices?  Should all
>>> iterators return logical vectors?  That way you could do x[evens() &
>>> last(5)] to get the even indices out of the last 5, as opposed to
>>> x[evens()][last(5)] which would return the last 5 even indices.
>>
>> Actually, I don't think so.  "evens() & last(5)" would fail to evaluate,
>> because you're trying to do a logical combination of two functions, not of
>> two logical vectors.  Or are we going to extend the logical operators to
>> work on iterators/selectors too?
> 
> Oh yes, that's a good point.  But wouldn't the following do the job?
> 
> "&.selector" <- function(a, b) {
>   function(n) a(n) & b(n)
> }
> 
> or
> 
> "&.selector" <- function(a, b) {
>   function(n) intersect(a(n), b(n))
> }
> 
> depending on whether selectors return logical or numeric vectors.
> Writing functions for | and ! would be similarly easy.  Or am I
> missing something?

No, I think those definitions would be fine, but I'd be concerned about 
speed issues if we start messing with primitives.

While we're at it, we might as well do the same sort of thing for :, and 
define a selector named end, and then 3:end would give a selector from 3 
to the end, which brings us back to the original question.  So it's not 
nearly as intrusive as I thought it would be.

Duncan Murdoch



More information about the R-help mailing list