[BioC] Newcommers question on subsetting IRangesList

Tomas Bjorklund [guest] guest at bioconductor.org
Mon May 12 19:32:47 CEST 2014


Hi, 

I'm new to R and bioconductor so this is probably a trivial question, but I cannot find a solution for this anywhere. 

In my workflow, I now utilize a temporary version of of vmatchPattern (found on the net) that allows for indels. This works great, but outputs an IRangesList object that I have issues with when I try to subset it. Here is an example of the output:

IRangesList of length 96979
[[1]]
IRanges of length 2
    start end width
[1]     1   7     7
[2]   278 283     6

[[2]]
IRanges of length 2
    start end width
[1]     1   7     7
[2]   281 286     6

[[3]]
IRanges of length 2
    start end width
[1]     1   7     7
[2]   256 261     6

...
<96976 more elements>

In this case, the same sequence is found twice in each read. What I would like to extract is the "end" of each first occurrence of the string i.e., 7 in the cases above. 

say that matchList is the IRangesList object if I use end(matchList) I get a list with both the end of the first and the second occurrence of the string. With every way I try to subset it I get errors. I can get it to work through using as.data.frame but this is very slow when you have millions of matches as in my cases. 

I hope that this was reasonably clear.

Thank you all for your help

All the best

Tomas



 -- output of sessionInfo(): 

R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] xlsx_0.5.5              muscle_3.8.31-2         Rlibstree_0.3-2         xlsxjars_0.6.0          rJava_0.9-6             ShortRead_1.22.0       
 [7] GenomicAlignments_1.0.1 BSgenome_1.32.0         Rsamtools_1.16.0        GenomicRanges_1.16.3    GenomeInfoDb_1.0.2      Biostrings_2.32.0      
[13] XVector_0.4.0           IRanges_1.22.6          BiocParallel_0.6.0      BiocGenerics_0.10.0    

loaded via a namespace (and not attached):
 [1] BatchJobs_1.2       BBmisc_1.6          Biobase_2.24.0      bitops_1.0-6        brew_1.0-6          codetools_0.2-8     DBI_0.2-7           digest_0.6.4       
 [9] fail_1.2            foreach_1.4.2       grid_3.1.0          hwriter_1.3         iterators_1.0.7     lattice_0.20-29     latticeExtra_0.6-26 plyr_1.8.1         
[17] RColorBrewer_1.0-5  Rcpp_0.11.1         RSQLite_0.11.4      sendmailR_1.1-2     stats4_3.1.0        stringr_0.6.2       tools_3.1.0         zlibbioc_1.10.0  

--
Sent via the guest posting facility at bioconductor.org.



More information about the Bioconductor mailing list