[BioC] Biostrings readDNAMultipleAlignment broken for fasta input

Janet Young jayoung at fhcrc.org
Fri Jul 4 02:26:18 CEST 2014


Hi there,

I found a broken function in Biostrings (I think) - readDNAMultipleAlignment doesn't work to read in fasta input files (my preferred sequence format for a lot of stuff outside of R).   There's an easy workaround I can use, but thought maybe you'd want to know anyway.  The code below should show you what I mean.

Thanks!

Janet

----------------------------

library(Biostrings)

## make a test fasta-format alignment file
mySeqs  <- DNAStringSet ( c("AGTGAGGTGATCGGTAGCTGATGCTAGTT",
                            "AGTGA-GTGATCGGTAG-TGATGGTAGTT",
                            "AGTGAGGTGATCGGTAGCTGATGCTAGTT",
                            "---GAGGAGATCGGTAGCTGTTGCTAGTT") )
names(mySeqs) <- c("seq1","seq2","seq3","seq4")
writeXStringSet( mySeqs, filepath="temp.fa")

### try reading it using readDNAMultipleAlignment
myAln <- readDNAMultipleAlignment("temp.fa", format="fasta")
# Error in XStringSet("DNA", x, start = start, end = end, width = width,  : 
#  error in evaluating the argument 'x' in selecting a method for function 'XStringSet': Error in isTRUEorFALSE(seek.first.rec) : 
#  argument "seek.first.rec" is missing, with no default


### workaround:
myAln2 <- readDNAStringSet("temp.fa", format="fasta")
myAln2 <- DNAMultipleAlignment(myAln2)

sessionInfo()

R version 3.1.0 Patched (2014-05-26 r65771)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] Biostrings_2.33.10  XVector_0.5.6       IRanges_1.99.16    
[4] S4Vectors_0.0.9     BiocGenerics_0.11.2

loaded via a namespace (and not attached):
[1] stats4_3.1.0    zlibbioc_1.11.1



More information about the Bioconductor mailing list