[Rd] In C, a fast way to slice a vector?

Saptarshi Guha saptarshi.guha at gmail.com
Mon May 11 16:25:50 CEST 2009


Impressive stuff. Nice to see people giving some though to this.
I will explore the packages you mentioned.

Thank you

Saptarshi Guha



On Mon, May 11, 2009 at 12:37 AM, Patrick Aboyoun <paboyoun at fhcrc.org> wrote:
> Saptarshi,
> I know of two alternatives you can use to do fast extraction of consecutive
> subsequences of a vector:
>
> 1) Fast copy:  The method you mentioned of creating a memcpy'd vector
> 2) Pointer management: Creating an externalptr object in R and manage the
> start and end of your data
>
> If you are looking for a prototyping environment to try, I recommend using
> the IRanges and Biostrings packages from the Bioconductor project. The
> IRanges package contains a function called subseq for performing 1) on all
> basic vector types (raw, logical, integer, etc.) and Biostrings package
> contains a subseq method on an externalptr based class that implements 2.
>
> I was going to lobby R core members quietly about adding something akin to
> subseq from IRanges into base R since it is extremely useful for all long
> vectors and could replace all a:b calls with a <= b in R code, but this
> publicity can't hurt.
>
> Here is an example:
>
>> source("http://bioconductor.org/biocLite.R")
>> biocLite(c("IRanges", "Biostrings"))
>
> << download output omitted >>
>>
>> suppressMessages(library(Biostrings))
>> x <- rep(charToRaw("a"), 1e7)
>> y <- BString(rawToChar(x))
>> suppressMessages(library(Biostrings))
>> x <- rep(charToRaw("a"), 1e7)
>> y <- BString(rawToChar(x))
>> system.time(x[13:1e7])
>
>   user  system elapsed
>  0.304   0.073   0.378
>>
>> system.time(subseq(x, 13))
>
>   user  system elapsed
>  0.011   0.007   0.019
>>
>> system.time(subseq(y, 13))
>
>   user  system elapsed
>  0.003   0.000   0.004
>>
>> identical(x[13:1e7], subseq(x, 13))
>
> [1] TRUE
>>
>> identical(x[13:1e7], charToRaw(as.character(subseq(y, 13))))
>
> [1] TRUE
>>
>> sessionInfo()
>
> R version 2.10.0 Under development (unstable) (2009-05-08 r48504)
> i386-apple-darwin9.6.0
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] Biostrings_2.13.5 IRanges_1.3.5
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.5.2
>
>
>
> Quoting Saptarshi Guha <saptarshi.guha at gmail.com>:
>
>> Hello,
>> Suppose in the following code,
>> PROTECT(sr = R_tryEval( .... ))
>>
>> sr is a RAWSXP vector. I wish to return another RAWSXP starting at
>> position 13 onwards (base=0).
>>
>> I could create another RAWSXP of the correct length and then memcpy
>> the required bytes and length to this new one.
>>
>> However is there a more efficient method?
>>
>> Regards
>> Saptarshi Guha
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>
>
>



More information about the R-devel mailing list