[R] Implementing streams in R

Gabriel Baud-Bovy baud-bovy.gabriel at hsr.it
Wed Feb 4 13:38:40 CET 2004


Thanks to Prof Brian Ripley, Duncan Murdoch and Luke Tierney for your replies.
In particular, I found Luke Tierney's info very useful. I just would like to
answer Duncan Murdoch's question (what is a stream [Lazy list in Luke's
implementation] in plain english?) in case it interests somebody else.
In fact, I will quote from "Structure and Interpretation  of Computer
Programs",  MIT Press, 1985, by Abelson and Sussman with some editing of my 
own:

"From an abstract point of view, streams are simply a
sequence of data objects. However, we will find that
straighforward implementation of streams as lists does
not allow us to exploit the power of stream processing. To
solve this problem, we introduced the technique of delayed
evaluation which enables us to represent very large (even infinite) data
structure as streams." (p. 242)

Unlike lists which are usually constructed entirely before
being processed,  "The basic idea is that [a stream is constructed only
partially] and to pass the partial construction to the program that consumes
the stream. If the consumer attempts to access a part of the stream
that has not yet been constructed, the stream will automatically construct
just enough more of itself to enable the consumer to access
the required part" (p. 261)

"We use streams to organize computations on a collections
of data in a way that corresponds in spirit to an electrical
engineer's concept of signal processing system [...], i.e.
in terms of signals flowing through a cascade of stages,
each of which implement part of the program plan." (p. 244)

For example, let us imagine a procedure that enumerate all
Fibonacci number until 20 odd ones have been found. Streams
avoid to have to enumerate anymore Fibonacci numbers than
necessary (this number does not even have to be specified).

accumulate.into.list(20,filter.odds(fibonacci.stream()))

Another example might help to explain why implementing streams
as list is very innefficient. The following expression computes
the second prime in the interval from 10000 to 1000000:

head.stream(tail.stream(filter.prime(enumerate.interval(10000,1000000))))

If streams were implemented as a lists, enumerate.interval would
construct a list of almost a million intergers, then each element
would be tested for primality, then almost all of that would
be ignored since we are interested only in finding the second prime
number in this interval. In a procedural programming style,
we would interleave the enumeration and the filtering and stio when
we reached the second prime. By using the technique of delayed
evaluation, we can use the elegant stream formulation while
preserving the efficiency of incremental computation. [abridged
from p. 260-261]

Gabriel
--------------------------------------------------------------------
Gabriel Baud-Bovy
Faculty of Psychology
UHSR University
via Olgettina, 58	tel: (+39) 02 2643 4839
20132 Milan, Italy	fax: (+39) 02 2643 4892




More information about the R-help mailing list