[Rd] slow load() in R2.6.0

Mark.Bravington at csiro.au Mark.Bravington at csiro.au
Thu Oct 11 09:36:18 CEST 2007


Problem fixed by R-patched, thanks; see comments below.

>On Thu, 11 Oct 2007, Mark.Bravington at csiro.au wrote:
>
>> I'm encountering excruciatingly slow load times for character vectors

>> in R 2.6.0-- up to 30sec for a 15K file that contains a no-attributes

>> character vector of length ~1e4 and object size ~0.5MB. In R 2.5.1, 
>> repeated loads of the same set of files are near-instantaneous.
>>
>> The problem is proving tricky to reproduce consistently from scratch,

>> so I have attached the 3 files used in the examples below.
>
>There was no attachment: since these are (I presume) binary files, can
you 
>not put them on a website (as suggested by the posting guide)?

Sorry, I would have if I could, but can't at present. The attachments
got through OK to me at least, though. If anyone does have an interest
in the files, let me know off-list and I'll re-send as a zip or
somesuch.

>
>> If I create a similar-looking object from scratch, then save it and
>> re-load it a few times, the problem doesn't always occur... at least
not 
>> in that session.
>>
>>
>> FWIW I have noticed that the time taken to load seems to be roughly a

>> power of 2 of the "base slow load time"-- could be a red herring.
>>
>> The problem seems specific to character vectors-- I noticed it with 
>> entire workspaces and have whittled it down to char vecs only.
>>
>> The example below is from a brand-new session with only the basic 
>> packages loaded; delays in my real sessions are much longer.
>
>Can you please try R-patched or R-devel.  We've found and solved a
couple 
>of performance issues with creating STRSXPs, but with character vectors
of 
>the millions of elements.

Thanks; R-patched fixed it. I did look in R-devel NEWS before posting,
but that doesn't mention the bug fix on CHARSXP which is in the
R-patched NEWS, so I didn't persist.

FWIW in case work is still being done on new CHARSXP: my problems were
with much shorter vectors (~1e4) than the millions mentioned in
patched-NEWS, and the strings were short too: 90% were '' and the other
10% were 'a'. Also, when the previously offending objects are loaded
into 2.6.0patched, they are 3-10X smaller (according to object.size)
than in unpatched-- I was also amazed by the compression! Looks like
unpatched R was allocating at least a 32-byte memory entry per
individual zero-character string. It is down to about 4 bytes per
(zero-character) string in R-patched.


Mark Bravington

>
>I tried several examples of around 10000 elements and got times of at
most 
>0.05 secs in 2.6.0.  These included parts of those examples on which we

>had seen performance issues.
>
>A few clues:
>
>- even your base time is much slower than I would expect.
>
>- you say  'a 15K file ... object size ~0.5MB'.  That's pretty
phenomenal
>   compression, and I am seeing file sizes more like 100Kb for objects
that
>   size.  Since object.size does take into account duplication, one way
to
>   get that would be to have all unique elements.  At ca 50bytes per
>   element you would need an average string length of about 15 chars.
Such
>   an object takes about 200Kb as a .rda file.
>
>
>>
>>
>> Mark Bravington
>> CSIRO Mathematical & Information Sciences
>> Marine Laboratory
>> Castray Esplanade
>> Hobart 7001
>> TAS
>>
>> ph (+61) 3 6232 5118
>> fax (+61) 3 6232 5012
>> mob (+61) 438 315 623
>>
>>
>>
>> Type 'demo()' for some demos, 'help()' for on-line help, or 
>> 'help.start()' for an HTML browser interface to help. Type 'q()' to 
>> quit R.
>>
>>> system.time( load( 'd:/r2.0/t1.rda'))
>>   user  system elapsed
>>    0.5     0.0     0.5
>>> system.time( load( 'd:/r2.0/t1.rda')) # same file; slower
>>   user  system elapsed
>>    3.5     0.0     3.5
>>> system.time( load( 'd:/r2.0/t1.rda'))
>>   user  system elapsed
>>   4.13    0.00    4.13
>>> system.time( load( 'd:/r2.0/t1.rda'))
>>   user  system elapsed
>>   3.51    0.00    3.52
>>
>>> system.time( load( 'd:/r2.0/t2.rda'))  # different bigger file
>>   user  system elapsed
>>   4.42    0.00    4.42
>>> system.time( load( 'd:/r2.0/t2.rda')) # same file; slower
>>   user  system elapsed
>>  10.44    0.00   10.44
>>> system.time( load( 'd:/r2.0/t2.rda'))
>>   user  system elapsed
>>  10.79    0.00   10.80
>>> system.time( load( 'd:/r2.0/t2.rda'))
>>   user  system elapsed
>>  10.39    0.00   10.41
>>> system.time( load( 'd:/r2.0/t1.rda')) # the smaller file again; 
>>> slower
>>   user  system elapsed
>>  10.67    0.00   10.69
>>> system.time( load( 'd:/r2.0/t3.rda')) # different smaller file
>>   user  system elapsed
>>  10.51    0.00   10.52
>>> system.time( load( 'd:/r2.0/t2.rda')) # now bigger file again:
slower
>>   user  system elapsed
>>  14.61    0.00   14.61
>>
>>
>>
>> --please do not edit the information below--
>>
>> Version:
>> platform = i386-pc-mingw32
>> arch = i386
>> os = mingw32
>> system = i386, mingw32
>> status =
>> major = 2
>> minor = 6.0
>> year = 2007
>> month = 10
>> day = 03
>> svn rev = 43063
>> language = R
>> version.string = R version 2.6.0 (2007-10-03)
>>
>> Windows XP (build 2600) Service Pack 2.0
>>
>> Locale: 
>>
LC_COLLATE=English_Australia.1252;LC_CTYPE=English_Australia.1252;LC_M
>> ON
>>
ETARY=English_Australia.1252;LC_NUMERIC=C;LC_TIME=English_Australia.1252
>>
>> Search Path:
>> Search Path:
>> .GlobalEnv, package:stats, package:graphics, package:grDevices, 
>> package:utils, package:datasets, package:methods, Autoloads, 
>> package:base
>>
>
>-- 
>Brian D. Ripley,                  ripley at stats.ox.ac.uk
>Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>University of Oxford,             Tel:  +44 1865 272861 (self)
>1 South Parks Road,                     +44 1865 272866 (PA)
>Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>



More information about the R-devel mailing list