[BioC] BStringSet not work with lists of many DNAString elements

Hervé Pagès hpages at fhcrc.org
Tue Oct 22 20:58:25 CEST 2013


Hi,

I cannot reproduce this.
I've tried to call BStringSet() on a list of 100 DNAString
objects of 25 million letters each and it worked.
Can you please provide a self-contained reproducible example?

Thanks,
H.


On 10/18/2013 10:46 AM, heyi xiao wrote:
> Dear all,
> I try used the Biostrings/BSgenome utilities to extract DNA sequences for Entrez genes. It worked fine till I am ready to output the extracted sequence to a fasta file. Because writeXStringSet is the only function for writing fasta files, which only works with an XStringSet object. I need to convert my list of DNAString objects into an XStringSet object. Unfortunately, the converter/constructor BStringSet only works with lists of a few DNAString elements. It produces error on larger lists as below. Not sure how to deal with the issue. Thanks for any suggestions/inputs in advance!
> Heyi
>
>>   exonSeq.set=BStringSet(exonSeq.list[1:30])
> Error in .Call2("SharedVector_mcopy", dest, dest.offset, src, src.start,  :
>    subscript out of bounds
>>   exonSeq.set=BStringSet(exonSeq.list[1:25])
>>   exonSeq.set=BStringSet(exonSeq.list[1:26])
> Error in .Call2("SharedVector_mcopy", dest, dest.offset, src, src.start,  :
>    subscript out of bounds
>>   exonSeq.set=BStringSet(exonSeq.list[26:30])
>>   exonSeq.set=BStringSet(exonSeq.list[26:40])
> Error in .Call2("SharedVector_mcopy", dest, dest.offset, src, src.start,  :
>    subscript out of bounds
>
>> head(exonSeq.list,3)
> $`442993`
>    133057-letter "DNAString" instance
> seq: TGAGACGGCTTTTATTCCTGAGCTTCTGCTGCTCAC...AAAGCTGTCATCAATGAAAAAAGGTAAGAGAAAAAC
>
> $`442994`
>    23917-letter "DNAString" instance
> seq: CAGTTCTGACCCACTTCAAGGTTACATCTCCAAGGT...CTTACGATTTTTGCAGATAAAAAATTTATCTGCAAA
>
> $`442995`
>    21718-letter "DNAString" instance
> seq: GTCTTCTCTCCTTGCTGCTCTCAGGTAGGGGCTGGG...GGAAGAAGCAGAATAAAGCAATTTTCCTTGAAGTGA
>
>> sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
> [1] BSgenome.Oaries.NCBI.Oar3.1_1.0 Biobase_2.21.6
> [3] BSgenome_1.29.0                 Biostrings_2.29.14
> [5] GenomicRanges_1.13.35           XVector_0.1.0
> [7] IRanges_1.19.19                 BiocGenerics_0.7.3
>
> loaded via a namespace (and not attached):
> [1] stats4_3.0.1 tools_3.0.1
>
>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list