[R] splitting very long character string

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Nov 1 17:14:19 CET 2006


On Wed, 1 Nov 2006, Arne.Muller at sanofi-aventis.com wrote:

> Hello,
>
> I've a very long character array (>500k characters) that need to split 
> by '\n' resulting in an array of about 60k numbers. The help on strsplit 
> says to use perl=TRUE to get better formance, but still it takes several 
> minutes to split this string.

Can't you use fixed=TRUE since you do not have a regular expression?
Nevertheless, if you are going to be creating about 60k character strings, 
the overhead in creating the strings will be very considerable.

If you just want the numbers, using an anonymous file() connection to 
write out the string and then using scan() might well be a lot more 
efficient.

> The massive string is the return value of a call to xmlElementsByTagName 
> from the XML library and looks like this:
                ^^^^^^^
'package' or your own C code accessing libxml?

> ...
> 12345
> 564376
> 5674
> 6356656
> 5666
> ...
>
> I've to read about a hundred of these files and was wondering whether there's a more efficient way to turn this string into an array of numerics. Any ideas?
>
> 	thanks a lot for your help
> 	and kind regards,
>
> 	Arne
>
>
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list