[Rd] Implementing a single-precision class with raw

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Aug 22 12:38:08 CEST 2005


On Fri, 19 Aug 2005, Colin A. Smith wrote:

> A package that I develop (xcms) sometimes needs to read and process
> vectors several hundreds of megabytes in size. (They only represent
> parts of a large data sets which can approach nearly 100GB.)
> Unfortunately, R sometimes hits the 2GB memory limit of Win32.

The rw-FAQ explains why that is _not_ the limit!

> To help cut the memory footprint in half, I'm implementing a "float" 
> class as a subclass of "raw".

Why via "raw"?

I believe the intention is that this sort of thing be done via external 
references, but as float and int are the same size on all current 
platforms, I would have considered R integers for storage. Then for 
example subsetting would work and you had a 4x larger size limit on 64-bit 
platforms.  (You would also have got automatic handling of endianness.)

> Because almost all the computation on the large vectors is done in C 
> code, having a somewhat limited single-precision data type is okay.
>
> I've run into a limitation with the .C() function where it does not
> handle raw vectors, which it will do in 2.2.0.

That is just not true!

> In the meantime, I'm using the .Call() function to access the raw 
> vectors. However, there don't seem to be any macros for handling raw 
> vectors in Rdefines.h.

So?  We recommend using Rinternals.h: Rdefines.h is a compatibility 
wrapper for macros from S4.  The raw type has not attempted to be 
compatible with S4, and we are not aware of any user who has compiled S4 
code using raw vectors that (s)he wishes to port to R.

(The R-exts.texi manual has been rather too optimistic about Rdefines.h: 
as you need to use SET_STRING_ELT and SET_VECTOR_ELT in R, you are rather 
limited as to what you can do in S4 style.  This has been so since R 1.2.0 
and Rdefines.h has hardly been updated since.)

> I've made a guess at what those macros would be and was wondering
> whether my guesses were correct and/or might be included in 2.2.0:
>
> #define NEW_RAW(n) allocVector(RAWSXP,n)
> #define RAW_POINTER(x) (RAW(x))
> #define AS_RAW(x) coerceVector(x,RAWSXP)
>
> I'm not sure whether coerceVector(x,RAWSXP) will actually work.

You should have read the code to find out (people answering your comment 
would have had to).  It will `actually work', but it may not do whatever 
it is that you expect.  (It interprets its input as integer (decimal if a 
string) representations of the bytes.)

This is in contrast to S, where I have no idea precisely what AS_RAW is 
supposed to do and no code to read.  (as(, "raw") seems to do wierd and 
unpredictable things, though, and the Green Book suggests that coercion 
probably is not intended to work.)

For completeness I have added my (informed) guesses to Rdefines.h in 
R-devel.

> Also, there isn't an Rf_isRaw() function, which would be useful for an 
> IS_RAW(x) macro.

Why would this be necessary?  TYPEOF(x) == RAWSXP is all that is needed.

> Another issue with the "float" class is that it will run into endian
> issues if it ever gets saved to disk and moved cross-platform. I don't
> really anticipate that happening but it might be nice to incorporate
> serialization hooks if possible. Are there any facilities in R for
> doing that?

See the comment above.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list