[Rd] binary file access [was "RFC: System and time support"...]

Prof Brian D Ripley ripley@stats.ox.ac.uk
Tue, 25 Jul 2000 10:33:17 +0100 (BST)

On Tue, 25 Jul 2000, Martin Maechler wrote:

> >>>>> "Duncan" == Duncan Murdoch <murdoch@stats.uwo.ca> writes:
>     Duncan> Is there any interest in adding binary file access to the base?
> yes, quite a bit !  (e.g., someone here doing image analysis, would have
> liked to be able to do  this)
>     Duncan> I think it would be really useful, and have put together a
>     Duncan> prototype (still for Windows only) that's on my web site at
> 	                        ==============
>     Duncan> <http://www.stats.uwo.ca/faculty/murdoch/software/Rstreams.zip>
> I first thought "great! .." when you announced this a while ago, but
> "Windows only" & relying on Delphi, i.e. proprietary software,
> stopped me to even have a look, sorry.
> We are committed primarily to the POSIX "clarification" of ANSI C and freely
> available tools.

I don't think that translating/re-writing it is a problem, but I thought
Duncan was planning to do this.  If not I will have a go.

> An aside :
>   Your binary files are read into/from "character", right?

[Doesn't look like it to me!]

>   I think (and others have talked similarly, here) that byte wise
>   reading and writing of files should go together with a "raw" atomic data
>   type in R -- and then we probably would want to do it "S version 4" (Sv4)
>   compatibly (not that I have looked what this would mean exactly).
>   which needs even a bit more extensions in "base R" than we have now.

Probably, but I find byte-wise reading not at all useful (and none of
our image-analysis files are that shallow).

> Coming back to your package:
> Is it worth/fast enough to port this to POSIX C?
> Have you ever compared it to the (very general) approach taken by Sv4 ?
> That would be something worth following at least in parts, I think.

Introducing a new type in R looks to me like a fairly major undertaking.
Duncan's package does a different task at a higher level than `raw' (which
is just an unstructured stream of bytes). As ?readint says

  Signed integers of sizes 1, 2, 4, and 8 bytes can be read.  Unsigned
  integers of any size up to 8 bytes can be read. (Integers larger than are
  supported in R will be returned in a vector of doubles.)  Floats of sizes
  4, 8, and 10 bytes can be read.  Complex values using any of the float
  sizes for the real and complex parts can be read.  Any size of character
  string that you can create can be read.

although some of that is Windows-specific (10 bytes = 80 bits = extended
format, I presume).

I suspect the best way forward is a get a general (non-Delphi, both Unix
and Windows) contributed package working and on CRAN, and then think about
merging it into base if it looks worthwhile.  (There is a lot of very
useful stuff not in base, and the point of my original posting was that
those are things which need to be internal and OS-specific.)

BTW, I think something like inttostr (but not that name) and its converse
would be useful in base.  
  Converts an integer to a string representation in base 2 to 36.
My memory says S had a function called something like oddometer, but
I can't find it.

Another comment: The R code uses _, F and T and is seriously lacking in
spaces. One way to get standard formating is to set
options(keep.source=FALSE) and then read in and dump the code.

How much support is there for adding a `raw' (byte-stream) type?


Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch