[R] Getting many substrings but only loading the original string one time.

Duncan Murdoch murdoch.duncan at gmail.com
Mon Apr 11 22:14:03 CEST 2011


On 11/04/2011 3:48 PM, Jonathan wrote:
> Hi All,
>      I'm looking for a way to get many substrings from a longer string and
> then stitch them together.  But, since the longer string is really, really
> long (like 250 MB long), I don't want to do this in a loop and load and
> re-load the longer string many times.  Does anybody have an idea?
>
> Maybe I could pass in two vectors (the first would have the starting
> coordinates, and the second would have the stopping coordinates), so it
> would be like a vectorized version of substr, where start and stop would be
> vector instead of single integers.
>
> Example (I'm reducing the size of the string for the example) of how this
> might work:
>
> >  longerString<- 'HelloThisIsMyLongerString"
> >  startVector<-  c(2,6,4)
> >  stopVector<- c(4,10,5)
>
> >  substrings<- vectorized_substr(longerString, startVector, stop Vector)
> >  longerString
> [1] "ell" "ThisI" "lo"

Use substring(), not substr().  It is vectorized:

 > substring(longerString, startVector, stopVector)
[1] "ell"   "ThisI" "lo"

It does this by replicating the longerString, but that doesn't mean 
actual copies are made:  just multiple pointers to the same big one.

Duncan Murdoch

> Then I'd like to concatenate them (there will be many of them)
>
> >  result<- paste(longerString,collapse='')
> >  result
> [1] "ellThisIlo"
>
> (perhaps the paste command as I've done it is the best way, but depending on
> how the substrings are reported there may be different ways). Thanks!
>
> Jonathan
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list