[Rd] New flag bit for serialized used by pqR

Radford Neal radford at cs.toronto.edu
Sun Sep 7 02:25:49 CEST 2014


I will shortly be releasing a new version of pqR (you can get a test
version from pqR-project.org now - scroll to the bottom of the page).
One new feature in this version requires adding a bit to the flags
written out when data is serialized.  I thought I'd let you know about
this so as to avoid any possible conflicts.

The new feature is that a few R objects are defined as constants,
which are probably put in read-only memory by the C compiler.  These
include NULL, TRUE, FALSE, NA, 0L, 1L, ... 10L, 0.0, 1.0, and the
one-element pairlists containing these.  Apart from NULL, these
constants are not guaranteed to be used for all instances of these
values, but they often are.  When data is serialized and then read
back in, I'd like for the occurrences of these constants to be
re-created as constants.  One might always use these constants when
reading in data, but I'm not doing that now because I'm not confident
that there is no code relying on certain instances of these objects
being unshared.

So I write out the constants with a flag bit saying they are
constants, and re-create them as constants only if this bit is set
(and they have constant versions in the current implementation).  Old
workspaces will never have this bit set, so nothing will be re-created
as constants (except NULL).  If a workspace with some of these
constant flag bits set is read by an old version of R, the flag bits
will just be ignored (by UnpackFlags in serialize.c), so the objects
will be restored the same as if they had been written by such an old
version.

So this should all work fine unless R Core implementations start using
this bit for something else.  (Or unless some old version of R used it
for something else - which isn't the case as far as I can tell, but
please let me know if you know of such usage.)

There are four more unused bits in the 32-bit word that is written
out, plus two more could be scrounged by storing the "type" in six
bits rather than eight, so there doesn't seem to be an immediate
shortage of bits.

The relevant declarations in serialize.c are as follows:

/*
 * Type/Flag Packing and Unpacking
 *
 * To reduce space consumption for serializing code (lots of list
 * structure) the type (at most 8 bits), several single bit flags,
 * and the sxpinfo gp field (LEVELS, 16 bits) are packed into a single
 * integer.  The integer is signed, so this shouldn't be pushed too
 * far.  It assumes at least 28 bits, but that should be no problem.
 */

#define IS_OBJECT_BIT_MASK (1 << 8)
#define HAS_ATTR_BIT_MASK (1 << 9)
#define HAS_TAG_BIT_MASK (1 << 10)
#define IS_CONSTANT_MASK (1 << 11)       /* <<--- added in pqR */
#define ENCODE_LEVELS(v) ((v) << 12)
#define DECODE_LEVELS(v) ((v) >> 12)
#define DECODE_TYPE(v) ((v) & 255)

Please let me know if you see any problem with this, or if for some
reason you'd prefer that I use one of the other four available bits
(in the top of the 32-bit word).

Regards,

    Radford Neal



More information about the R-devel mailing list