[Rd] slow load() in R2.6.0

Prof Brian Ripley ripley at stats.ox.ac.uk
Thu Oct 11 06:27:22 CEST 2007


On Thu, 11 Oct 2007, Mark.Bravington at csiro.au wrote:

> I'm encountering excruciatingly slow load times for character vectors in
> R 2.6.0-- up to 30sec for a 15K file that contains a no-attributes
> character vector of length ~1e4 and object size ~0.5MB. In R 2.5.1,
> repeated loads of the same set of files are near-instantaneous.
>
> The problem is proving tricky to reproduce consistently from scratch, so
> I have attached the 3 files used in the examples below.

There was no attachment: since these are (I presume) binary files, can you 
not put them on a website (as suggested by the posting guide)?

> If I create a similar-looking object from scratch, then save it and 
> re-load it a few times, the problem doesn't always occur... at least not 
> in that session.
>
>
> FWIW I have noticed that the time taken to load seems to be roughly a
> power of 2 of the "base slow load time"-- could be a red herring.
>
> The problem seems specific to character vectors-- I noticed it with
> entire workspaces and have whittled it down to char vecs only.
>
> The example below is from a brand-new session with only the basic
> packages loaded; delays in my real sessions are much longer.

Can you please try R-patched or R-devel.  We've found and solved a couple 
of performance issues with creating STRSXPs, but with character vectors of 
the millions of elements.

I tried several examples of around 10000 elements and got times of at most 
0.05 secs in 2.6.0.  These included parts of those examples on which we 
had seen performance issues.

A few clues:

- even your base time is much slower than I would expect.

- you say  'a 15K file ... object size ~0.5MB'.  That's pretty phenomenal
   compression, and I am seeing file sizes more like 100Kb for objects that
   size.  Since object.size does take into account duplication, one way to
   get that would be to have all unique elements.  At ca 50bytes per
   element you would need an average string length of about 15 chars.  Such
   an object takes about 200Kb as a .rda file.


>
>
> Mark Bravington
> CSIRO Mathematical & Information Sciences
> Marine Laboratory
> Castray Esplanade
> Hobart 7001
> TAS
>
> ph (+61) 3 6232 5118
> fax (+61) 3 6232 5012
> mob (+61) 438 315 623
>
>
>
> Type 'demo()' for some demos, 'help()' for on-line help, or
> 'help.start()' for an HTML browser interface to help.
> Type 'q()' to quit R.
>
>> system.time( load( 'd:/r2.0/t1.rda'))
>   user  system elapsed
>    0.5     0.0     0.5
>> system.time( load( 'd:/r2.0/t1.rda')) # same file; slower
>   user  system elapsed
>    3.5     0.0     3.5
>> system.time( load( 'd:/r2.0/t1.rda'))
>   user  system elapsed
>   4.13    0.00    4.13
>> system.time( load( 'd:/r2.0/t1.rda'))
>   user  system elapsed
>   3.51    0.00    3.52
>
>> system.time( load( 'd:/r2.0/t2.rda'))  # different bigger file
>   user  system elapsed
>   4.42    0.00    4.42
>> system.time( load( 'd:/r2.0/t2.rda')) # same file; slower
>   user  system elapsed
>  10.44    0.00   10.44
>> system.time( load( 'd:/r2.0/t2.rda'))
>   user  system elapsed
>  10.79    0.00   10.80
>> system.time( load( 'd:/r2.0/t2.rda'))
>   user  system elapsed
>  10.39    0.00   10.41
>> system.time( load( 'd:/r2.0/t1.rda')) # the smaller file again; slower
>   user  system elapsed
>  10.67    0.00   10.69
>> system.time( load( 'd:/r2.0/t3.rda')) # different smaller file
>   user  system elapsed
>  10.51    0.00   10.52
>> system.time( load( 'd:/r2.0/t2.rda')) # now bigger file again: slower
>   user  system elapsed
>  14.61    0.00   14.61
>
>
>
> --please do not edit the information below--
>
> Version:
> platform = i386-pc-mingw32
> arch = i386
> os = mingw32
> system = i386, mingw32
> status =
> major = 2
> minor = 6.0
> year = 2007
> month = 10
> day = 03
> svn rev = 43063
> language = R
> version.string = R version 2.6.0 (2007-10-03)
>
> Windows XP (build 2600) Service Pack 2.0
>
> Locale:
> LC_COLLATE=English_Australia.1252;LC_CTYPE=English_Australia.1252;LC_MON
> ETARY=English_Australia.1252;LC_NUMERIC=C;LC_TIME=English_Australia.1252
>
> Search Path:
> Search Path:
> .GlobalEnv, package:stats, package:graphics, package:grDevices,
> package:utils, package:datasets, package:methods, Autoloads,
> package:base
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list