[Rd] Small inconsistency in serialize() between R versions and implications on digest()

Henrik Bengtsson hb at stat.berkeley.edu
Wed Mar 7 23:24:29 CET 2007


Hi,

I noticed that serialize() gives different results depending on R
version, which has implications to the digest() function in the digest
package.  Note, it does give the same output across platforms.  I know
that serialize() is under development, but is this expected, e.g. is
there some kind of header in the result that specifies "who" generated
the stream, and if so, exactly what bytes are they?

SETUP:

R versions:
A) R v2.4.0 (2006-10-03)
B) R v2.4.1pat (2007-01-13 r40470)
C) R v2.5.0dev (2006-12-12 r40167)

This is on WinXP and I start R with Rterm --vanilla.

Example: Identical serialize() calls using the different R versions.

> raw <- serialize(1, connection=NULL, ascii=TRUE)
> print(raw)

gives:

(A): [1] 41 0a 32 0a 31 33 32 30 39 36 0a 31 33 31 38 34 30 0a 31 34
0a 31 0a 31 0a
(B): [1] 41 0a 32 0a 31 33 32 30 39 37 0a 31 33 31 38 34 30 0a 31 34
0a 31 0a 31 0a
(C): [1] 41 0a 32 0a 31 33 32 33 35 32 0a 31 33 31 38 34 30 0a 31 34
0a 31 0a 31 0a

Note the difference in raw bytes 8 to 10, i.e.

> raw[7:11]
(A): [1] 32 30 39 36 0a
(B): [1] 32 30 39 37 0a
(C): [1] 32 33 35 32 0a

Does bytes 8, 9 and 10 in the raw vector somehow contain information
about the R version or similar?  The following poor mans test says
that is the only difference:

On all R versions, the following gives identical results:

> raw <- serialize(1:1e4, connection=NULL, ascii=TRUE)
> raw <- as.integer(raw[-c(8:10)])
> sum(raw)
[1] 2147884
> sum(log(raw))
[1] 177201.2

If it is true that there is a R version specific header in serialized
objects, then the digest() function should exclude such header in
order to produce consistent results across R versions, because now
digest(1) gives different results.

Thank you

Henrik



More information about the R-devel mailing list