[BioC] write.XStringView/write.XStringSet highly inefficient

Michael Dondrup Michael.Dondrup at uni.no
Tue Jul 27 13:56:37 CEST 2010


Hi,

I was trying to use write.XStringView on a larger dataset but to no avail. It seems like it is not implemented 
efficiently. What I am trying is:

I downloaded http://hgdownload.cse.ucsc.edu/goldenPath/hg18/chromosomes/chr1.fa.gz

> library(Biostrings)
> dnasts <- read.DNAStringSet(file="chr1.fa")
# break up the fasta file into segments of size 60
> dnaviews <- Views(dnasts[[1]], start = seq(1, length(dnasts[[1]]), 60), width=60)
> write.XStringViews(dnaviews, file="out.fa")
... I interrupted the process after 1h reaching a memory peak of over 3GB.
In principle doing the whole task should not take longer than a few seconds. I found this report:
https://stat.ethz.ch/pipermail/bioc-sig-sequencing/2010-April/001160.html
I guess that is the same problem? Has there been any progress?

 Is there probably a more efficient way of implementing this, e.g. using cat()?

Thanks a lot
Michael

> sessionInfo()
R version 2.11.1 (2010-05-31) 
x86_64-unknown-linux-gnu 

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Biostrings_2.16.9 IRanges_1.6.8    

loaded via a namespace (and not attached):
[1] Biobase_2.8.0
> 



More information about the Bioconductor mailing list