[R] splitting very long character string

Gabor Grothendieck ggrothendieck at gmail.com
Thu Nov 2 11:32:40 CET 2006


You could use the file= argument on cat to avoid the two calls to sink:

cat(tmp, file = tmp.file)

On 11/2/06, Arne.Muller at sanofi-aventis.com
<Arne.Muller at sanofi-aventis.com> wrote:
> Hello,
>
> thanks a lot for your help on splitting the string to get a numeric vector. I'm now writign the string to a tempfile and read it in via scan - this is fa&st enough for me:
>
> library(XML);
>
> ...
> tmp = xmlElementsByTagName(root, 'tofDataSample', recursive=T);
> tmp = xmlValue(tmp[[1]]);
> cat(paste('splitting', nchar(tmp), 'string ...\n'));
> tmp.file = tempfile();
> sink(tmp.file);
> cat(tmp);
> sink();
> tmp = scan(tmp.file);
> unlink(tmp.file);
> cat(paste('splitting done,', length(tmp), 'elements\n'));
>
>        thanks again
>        and kind regards,
>
>        Arne
>
> > -----Original Message-----
> > From: john seers (IFR) [mailto:john.seers at bbsrc.ac.uk]
> > Sent: Wednesday, November 01, 2006 17:01
> > To: Muller, Arne PH/FR; r-help at stat.math.ethz.ch
> > Subject: RE: [R] splitting very long character string
> >
> >
> >
> > Hi Arne
> >
> > If you are reading in from files and they are just one number per line
> > it would be more efficient to use scan directly.  ?scan
> >
> > For example:
> >
> > > filen<-"C:/temp/tt.txt"
> > > i<-scan(filen)
> > Read 5 items
> > > i
> > [1]   12345  564376    5674 6356656    5666
> > >
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: r-help-bounces at stat.math.ethz.ch
> > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of
> > Arne.Muller at sanofi-aventis.com
> > Sent: 01 November 2006 15:47
> > To: r-help at stat.math.ethz.ch
> > Subject: [R] splitting very long character string
> >
> >
> > Hello,
> >
> > I've a very long character array (>500k characters) that need to split
> > by '\n' resulting in an array of about 60k numbers. The help
> > on strsplit
> > says to use perl=TRUE to get better formance, but still it
> > takes several
> > minutes to split this string.
> >
> > The massive string is the return value of a call to
> > xmlElementsByTagName
> > from the XML library and looks like this:
> >
> > ....
> > 12345
> > 564376
> > 5674
> > 6356656
> > 5666
> > ....
> >
> > I've to read about a hundred of these files and was wondering whether
> > there's a more efficient way to turn this string into an array of
> > numerics. Any ideas?
> >
> >       thanks a lot for your help
> >       and kind regards,
> >
> >       Arne
> >
> >
> >
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list