[Rd] HOW TO AVOID LOOPS

carlos martinez martinezbula at earthlink.net
Sun Apr 13 03:33:27 CEST 2008


Appreciate the ingenious and effective suggestions and feedback from:

Dan Davison
Vincent Goulet
Martin Morgan
Hadley Wickham

The variety of technical approaches proposes so far are clear prove of the
strong and flexible capabilites of the R system, and specially the dynamics
and technical understanding of the R user base.

We tested all four recommendations with an input vector of more than 850000
components, and got time-responses from about 40-second to 20-seconds.

All four approches produced the desired vector. The Wickham's approach
produced and extra vector, but the second vector included the correct
format.

Just one additional follow up, to obtain from the same input vector:
c(0,0,1,0,1,1,1,0,0,1,1,0,1,0,1,1,1,1,1,1)

A vector of the following format:
(0,0,1,0,0,0,3,0,0,0,2,0,1,0,0,0,0,0,6)

Will be easier and more efficient to start from the original input vector,
or start from the above second vector
(0,0,1,0,1,2,3,0,0,1,2,0,1,0,1,2,3,4,5,6)

Thanks for your responses.

-------------------------------------------------------------------------
Hadley Wickham Approach

How about:

unlist(lapply(split(x, cumsum(x == 0)), seq_along)) - 1

Hadley
--------------------------------------------------------------------------
-----Original Message-----
From: Martin Morgan [mailto:mtmorgan at fhcrc.org] 
Sent: Saturday, April 12, 2008 5:00 PM
To: Dan Davison
Cc: martinezbula at earthlink.net
Subject: Re: [Rd] HOW TO AVOID LOOPS

(anonymous 'off-list' response; some extra calcs but tidy)

> x=c(0,0,1,0,1,1,1,0,0,1,1,0,1,0,1,1,1,1,1,1)
> x * unlist(lapply(rle(x)$lengths, seq))
 [1] 0 0 1 0 1 2 3 0 0 1 2 0 1 0 1 2 3 4 5 6


Dan Davison <davison at stats.ox.ac.uk> writes:

> On Sat, Apr 12, 2008 at 06:45:00PM +0100, Dan Davison wrote:
>> On Sat, Apr 12, 2008 at 01:30:13PM -0400, Vincent Goulet wrote:
>> > Le sam. 12 avr. à 12:47, carlos martinez a écrit :
>> > >> Looking for a simple, effective a minimum execution time solution.
>> > >>
>> > >> For a vector as:
>> > >>
>> > >> c(0,0,1,0,1,1,1,0,0,1,1,0,1,0,1,1,1,1,1,1)
>> > >>
>> > > To transform it to the following vector without using any loops:
>> > >
>> > >> (0,0,1,0,1,2,3,0,0,1,2,0,1,0,1,2,3,4,5,6)
>> > >>
>> > > Appreciate any suggetions.
>> > 
>> > This does it -- but it is admittedly ugly:
>> > 
>> >  > x <- c(0,0,1,0,1,1,1,0,0,1,1,0,1,0,1,1,1,1,1,1)
>> >  > ind <- which(x == 0)
>> >  > unlist(lapply(mapply(seq, ind, c(tail(ind, -1) - 1, length(x))),
>> > function(y) cumsum(x[y])))
>> >   [1] 0 0 1 0 1 2 3 0 0 1 2 0 1 0 1 2 3 4 5 6
>> > 
>> > (The mapply() part is used to create the indexes of each sequence 
>> > in x starting with a 0. The rest is then straightforward.)
>> 
>> 
>> Here's my effort. Maybe a bit easier to digest? Only one *apply so
probably more efficient.
>> 
>> function(x=c(0,0,1,0,1,1,1,0,0,1,1,0,1,0,1,1,1,1,1,1)) {
>>     d <- diff(c(0,x,0))
>>     starts <- which(d == 1)
>>     ends <- which(d == -1)
>>     x[x == 1] <- unlist(lapply(ends - starts, function(n) 1:n))
>>     x
>> }
>> 
>
> Come to think of it, I suggest using the existing R function rle(), rather
than my dodgy substitute.
>
> e.g.
>
> g <- function(x=c(0,0,1,0,1,1,1,0,0,1,1,0,1,0,1,1,1,1,1,1)) {
>
>     runs <- rle(x)
>     runlengths <- runs$lengths[runs$values == 1]
>     x[x == 1] <- unlist(lapply(runlengths, function(n) 1:n))
>     x
> }
>
> Dan
>
> p.s. R-help would perhaps have been more appropriate than R-devel
>
>
>> Dan
>> 
>> 
>> > 
>> > HTH
>> > 
>> > ---
>> >    Vincent Goulet, Associate Professor
>> >    École d'actuariat
>> >    Université Laval, Québec
>> >    Vincent.Goulet at act.ulaval.ca   http://vgoulet.act.ulaval.ca
>> > 
>> > ______________________________________________
>> > R-devel at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview
Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793



More information about the R-devel mailing list