[R] The end of Matlab

Fri Dec 12 18:18:22 CET 2008

On 12/12/2008 11:38 AM, hadley wickham wrote:
> On Fri, Dec 12, 2008 at 8:41 AM, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
>> On 12/12/2008 8:25 AM, hadley wickham wrote:
>>>>
>>>> From which you might conclude that I don't like the design of subset, and
>>>> you'd be right.  However, I don't think this is a counterexample to my
>>>> general rule.  In the subset function, the select argument is treated as
>>>> an
>>>> unevaluated expression, and then there are rules about what to do with
>>>> it.
>>>>  (I.e. try to look up name `a` in the data frame, if that fails, ...)
>>>>
>>>> For the requested behaviour to similarly fall within the general rule,
>>>> we'd
>>>> have to treat all indices to all kinds of things (vectors, matrices,
>>>> dataframes, etc.) as unevaluated expressions, with special handling for
>>>> the
>>>> particular symbol `end`.
>>>
>>> Except you wouldn't have to necessarily change indexing - you could
>>> change seq instead.  Then 5:end could produce some kind of special
>>> data structure (maybe an iterator) that was recognised by the various
>>> indexing functions.
>>
>> Ummm, doesn't that require changes to *both* indexing and seq?
> 
> Ooops, yes.  I meant it wouldn't require indexing to use unevaluated
> expression.
> 
>>> This would still be a lot of work for not a lot
>>> of payoff, but it would be a logically consistent way of adding this
>>> behaviour to indexing, and the basic work would make it possible to
>>> develop other sorts of indexing, eg df[evens(), ], or df[last(5),
>>> last(3)].
>>
>> I agree:  it would be a nice addition, but a fair bit of work.  I think it
>> would be quite doable for the indexable things in the base packages, but
>> there are a lot of contributed packages that define [ methods, and those
>> methods would all need to be modified too.
> 
> That's true, although I suspect many contributed [.methods eventually
> delegate to base methods and might work without further modification.
> 
>> (Just to be clear, when I say doable, I'm thinking that your iterators
>> return functions that compute subsets of index ranges.  For example, evens()
>> might be implemented as
>>
>> evens <- function() {
>>  result <- function(indices) {
>>    indices[indices %% 2 == 0]
>>  }
>>  class(result) <- "iterator"
>>  return(result)
>> }
>>
>> and then `[` in v[evens()] would recognize that it had been passed an
>> iterator, and would pass 1:length(v) to the iterator to get the subset of
>> even indices.  Is that what you had in mind?)
> 
> Yes, that's exactly what I was thinking, although you'd have to put
> some thought into the conventions - would it be better to pass in the
> length of the vector instead of a vector of indices?  Should all
> iterators return logical vectors?  That way you could do x[evens() &
> last(5)] to get the even indices out of the last 5, as opposed to
> x[evens()][last(5)] which would return the last 5 even indices.

Actually, I don't think so.  "evens() & last(5)" would fail to evaluate, 
because you're trying to do a logical combination of two functions, not 
of two logical vectors.  Or are we going to extend the logical operators 
to work on iterators/selectors too?

Duncan Murdoch

> You could also imagine similar iterators for random sampling, like
> samp(0.2) to choose 20% of the indices, or boot(0.8) to choose 80%
> with replacement.  first(n) could also be useful, selecting the first
> min(n, length(vector)) observations.   An iterator version of rev()
> would also be handy.
> 
> Maybe selector would be a better name than iterator though, as these
> don't have the same feel as iterators in other languages.
> 
> Hadley
>