[Rd] [R] Successive subsets from a vector?

Thomas Lumley tlumley at u.washington.edu
Tue Aug 22 16:54:59 CEST 2006


On Tue, 22 Aug 2006, hadley wickham wrote:

>> The loop method took 195 secs.  Just assigning to an answer of the correct
>> length reduced this to 5 secs.  e.g. use
>>
>>     ADDRESSES <- character(length(VECTOR)-4)
>>
>> Moral: don't grow vectors repeatedly.
>
> Other languages (eg. Java) grow the size of the vector independently
> of the number of observations in it (I think Java doubles the size
> whenever the vector is filled), thus changing O(n) behaviour to O(log
> n).  I've always wondered why R doesn't do this.
>

(redirected to r-devel, a better location for wonder of this type)

This was apparently the intention at the beginnng of time, thus the LENGTH 
and TRUELENGTH macros in the source.

In many cases, though, there is duplication as well as length change, eg
    x<-c(x, something)
will set NAMED(x) to 2 by the second iteration, forcing duplication at 
each subsequent iteration. The doubling strategy would still leave us with 
O(n) behaviour, just with a smaller constant.

The only case I can think of where the doubling strategy actually helps a 
lot is the one in Atte's example, assigning off the end of an existing 
vector.  That wasn't legal in early versions of R (and I think most people 
would agree that it shouldn't be encouraged).

A reAllocVector() function would clearly have some benefits, but not as 
many as one would expect. That's probably why it hasn't been done (which 
doesn't mean that it shouldn't be done).

Providing the ability to write assignment functions that don't duplicate 
is a more urgent problem.


 	-thomas


Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle



More information about the R-devel mailing list