[Rd] serialize() to via temporary file is heaps faster than doing it directly (on Windows)

Henrik Bengtsson hb at stat.berkeley.edu
Fri Aug 29 21:43:37 CEST 2008


I just want to re-post this thread in case it slipped through the
"summer sieve" of someone that might be interested and/or has a real
solution beyond my serialize2() patch.

Cheers

Henrik

On Thu, Jul 24, 2008 at 8:10 PM, Henrik Bengtsson <hb at stat.berkeley.edu> wrote:
> Hi,
>
> FYI, I just notice that on Windows (but not Linux) it is orders of
> magnitude (below it's 50x) faster to serialize() and object to a
> temporary file and then read it back, than to serialize to an object
> directly.  This has for instance impact on how fast digest::digest()
> can provide a checksum.
>
> Example:
> x <- 1:1e7;
> t1 <- system.time(raw1 <- serialize(x, connection=NULL));
> print(t1);
> #    user  system elapsed
> #   174.23  129.35  304.70  ## 5 minutes
> t2 <- system.time(raw2 <- serialize2(x, connection=NULL));
> print(t2);
> #     user  system elapsed
> #     2.19    0.18    5.72      ## 5 seconds
> print(t1/t2);
> #      user    system   elapsed
> #   79.55708 718.61111  53.26923
> stopifnot(identical(raw1, raw2));
>
> where serialize2() is serialize():ing to file and reading the results back:
>
> serialize2 <- function(object, connection, ...) {
>  if (is.null(connection)) {
>    # It is faster to serialize to a temporary file and read it back
>    pathname <- tempfile();
>    con <- file(pathname, open="wb");
>    on.exit({
>      if (!is.null(con))
>        close(con);
>      if (file.exists(pathname))
>        file.remove(pathname);
>    });
>    base::serialize(object, connection=con, ...);
>    close(con);
>    con <- NULL;
>    fileSize <- file.info(pathname)$size;
>    readBin(pathname, what="raw", n=fileSize);
>  } else {
>    base::serialize(object, connection=connection, ...);
>  }
> } # serialize2()
>
> The above benchmarking was done in a fresh R v2.7.1 session on WinXP Pro:
>
>> sessionInfo()
> R version 2.7.1 Patched (2008-06-27 r46012)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MON
> ETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
>
> When I do the same on a Linux machine there is no difference:
>
>> sessionInfo()
> R version 2.7.1 (2008-06-23)
> x86_64-unknown-linux-gnu
>
> locale:
> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> Is there an obvious reason (and an obvious fix) for this?
>
> Cheers
>
> Henrik
>



More information about the R-devel mailing list