[R] Successive subsets from a vector?

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Aug 22 12:13:48 CEST 2006

```    embed(VECTOR, 5)[, 5:1]

gives the subsets, so something like

apply(embed(VECTOR, 5)[, 5:1], 1, paste, collapse="")

does the job.

The following is a bit more efficient

ind <- 1:(length(VECTOR)-4)
do.call(paste, c(lapply(0:4, function(j) VECTOR[ind+j]), sep=""))

but by looking at how embed() works it could be made as efficient.

Larger example:

VECTOR <- sample(1:10, 1e5, replace=TRUE)
> system.time(apply(embed(VECTOR, 5)[, 5:1], 1, paste, collapse=""))
[1] 5.73 0.05 5.81   NA   NA
> system.time({ind <- 1:(length(VECTOR)-4)
+ do.call(paste, c(lapply(0:4, function(j) VECTOR[ind+j]), sep=""))
+ })
[1] 1.00 0.01 1.01   NA   NA

The loop method took 195 secs.  Just assigning to an answer of the correct
length reduced this to 5 secs.  e.g. use

Moral: don't grow vectors repeatedly.

On Tue, 22 Aug 2006, kone wrote:

> I'd like to pick every imbricated five character long subsets from a
> vector. I guess there is some efficient way to do this without loops...
> Here is a for-loop-version and a model for output:
>
> VECTOR=c(1,4,2,6,5,0,11,10,4,3,6,8,6);
>

You do not need the semicolons, and they just confuse readers.

> for(i in 1:(length(VECTOR)-4)){
> }
>
> [1] "14265"   "42650"   "265011"  "6501110" "5011104" "0111043"
> "1110436" "104368"
> [9] "43686"
>
>
> Atte Tenkanen
> University of Turku, Finland
>
> 	[[alternative text/enriched version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> and provide commented, minimal, self-contained, reproducible code.
>

--
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

```