[R] Runs Statistic

Prof Brian D Ripley ripley at stats.ox.ac.uk
Tue Sep 4 08:00:06 CEST 2001


On Tue, 4 Sep 2001, Janusz Kawczak wrote:

> Dear R users:
>
> I am trying to get the runs statistics for a very long binary sequences
> and I ran into trouble with the sped  when using (probably too many) "if"
> and "for" statements in my program. Let me explain what I would like to
> see as the final output. Say, I have a sequence 1001101000. The runs vector
> should be r=(1,2,2,1,1,3), i.e. counting similar subsequences. Also, it
> could be treated as an independent question, the longest run is of
> interest.

> inp <- as.numeric(strsplit("1001101000", "")[[1]])
> inp
 [1] 1 0 0 1 1 0 1 0 0 0
> rle(inp)$lengths
[1] 1 2 2 1 1 3

> Just to give you an idea the length of the original sequences will be of
> order 10^6 and greater.

That might tax rle, but you can do it in pieces if needed. However, on a
1GHz 512Mb machine:

> inp <- rbinom(1e6, 1, 0.4)
> system.time(foo <- rle(inp)$lengths)
[1] 3.49 0.25 3.74 0.00 0.00


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list