[R] fast mkChar

Peter Dalgaard p.dalgaard at biostat.ku.dk
Tue Jun 8 22:23:24 CEST 2004


"Vadim Ogranovich" <vograno at evafunds.com> writes:

> Hi,
>  
> To speed up reading of large (few million lines) CSV files I am writing
> custom read functions (in C). By timing various approaches I figured out
> that one of the bottlenecks in reading character fields is the mkChar()
> function which on each call incurs a lot of garbage-collection-related
> overhead.
>  
> I wonder if there is a "vectorized" version of mkChar, say mkChar2(char
> **, int length) that converts an array of C strings to a string vector,
> which somehow amortizes the gc overhead over the entire array?
>  
> If no such function exists, I'd appreciate any hint as to how to write
> it.

The real issue here is that character vectors are implemented as
generic vectors of little R objects (CHARSXP type) that each hold one
string. Allocating all those objects is probably what does you in.

The reason behind the implementation is probably that doing it that
way allows the mechanics of the garbage collector to be applied
directly (CHARSXPs are just vectors of bytes), but it is obviously
wasteful in terms of total allocation. If you can think up something
better, please say so (but remember that the memory management issues
are nontrivial).

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907




More information about the R-help mailing list