[Rd] RFC: "loop connections"

dhinds@sonic.net dhinds at sonic.net
Sat Aug 27 22:11:57 CEST 2005


Martin Maechler <maechler at stat.math.ethz.ch> wrote:

> In the mean time, I think it has become clear that
> "loopconnection" isn't necessarily a better name, and that
> textConnection() has been there in "the S litterature" for a
> good reason and for quite a while.
> Let's forget about the naming and the exact UI for the moment.

That is entirely fine with me.

> I think the main point of David's proposal is still worth
> consideration:  One way to see text connections is as a way to
> treat some kind of R objects as "generalized files" i.e., connections.
> And AFAICS David proposes to enlarge the kind of R objects that
> can be dealt with as connections 
>   from  {"character"} 
>   to    {"character", "raw"} 
> something which has some appeal to me.
> IIUC, Brian Ripley is doubting the potential use for the
> proposed generalization, whereas David makes a point of someone
> else (the 'caTools' author) having written raw2bin / bin2raw function
> for a related use case.

> Maybe you can elaborate on the above a bit, David?

I'm not sure what more can be said on the subject.  Most connection
types support both text-mode and binary-mode, so this is partly a
proposal for symmetry and consistency.  Prof. Ripley is correct that
binary anonymous connections provide overlapping functionality, but
the semantics are slightly different, and performance is different.  I
don't see an advantage for having the "text-like" connection only
support text access.

I ran some quick benchmarks on three implementations, where the task
was conversion back and forth between a numeric vector of length 1000,
and a packed raw vector of single precision floats, repeated 1000
times.  The first method uses a new anonymous connection for each
transformation.  The second reuses a single anonymous connection.  The
third uses a new raw textConnection for each transformation.

  usr  sys  elapsed
  1.5  9.5   14.6    anonymous
  1.1  0.1    1.2    persistent
  0.9  0.0    0.9    raw

Setting up and tearing down anonymous connections is very slow (at
least on Windows) because it requires substantial OS intervention.  If
a program can be easily organized so that a single connection can be
used, performance is much better.

I would appreciate feedback on how to improve raw_write() for the case
of appending to an existing vector.  Is it possible to reserve free
space at the end of a vector for appending?  I see that there is a
distinction between LENGTH() and TRUELENGTH() but I'm not sure if this
is the intended use.

> In any case, as you might have guessed by now, R-core would have
> been more positive to a proposal to generalize current
> textConnection() - fully back-compatibly - rather than renaming
> it first.

I have no interest in sacrificing back compatibility; I did intend
that there would always be a textConnection() entry point, if only as
a wrapper for the new constructor.  The only reason for a new name
(and I'm certainly open to suggestions) is because the notion of a
binary or raw textConnection seemed wrong.

-- David Hinds



More information about the R-devel mailing list