[BioC] Biostrings readDNAMultipleAlignment broken for fasta input

Fri Jul 4 03:26:40 CEST 2014

Hi Janet,

It's funny that I received a bug report for this same issue off list
from someone else just a few minutes before your post. Sounds like
you guys are collaborating on the same project and running into the
same bugs ;-)

This is fixed in Biostrings 2.32.1 (release) and 2.33.12 (devel).
Both won't become available thru biocLite() before Saturday morning
though, but you can get them now from svn.

Cheers,
H.

On 07/03/2014 05:26 PM, Janet Young wrote:
> Hi there,
>
> I found a broken function in Biostrings (I think) - readDNAMultipleAlignment doesn't work to read in fasta input files (my preferred sequence format for a lot of stuff outside of R).   There's an easy workaround I can use, but thought maybe you'd want to know anyway.  The code below should show you what I mean.
>
> Thanks!
>
> Janet
>
> ----------------------------
>
> library(Biostrings)
>
> ## make a test fasta-format alignment file
> mySeqs  <- DNAStringSet ( c("AGTGAGGTGATCGGTAGCTGATGCTAGTT",
>                              "AGTGA-GTGATCGGTAG-TGATGGTAGTT",
>                              "AGTGAGGTGATCGGTAGCTGATGCTAGTT",
>                              "---GAGGAGATCGGTAGCTGTTGCTAGTT") )
> names(mySeqs) <- c("seq1","seq2","seq3","seq4")
> writeXStringSet( mySeqs, filepath="temp.fa")
>
> ### try reading it using readDNAMultipleAlignment
> myAln <- readDNAMultipleAlignment("temp.fa", format="fasta")
> # Error in XStringSet("DNA", x, start = start, end = end, width = width,  :
> #  error in evaluating the argument 'x' in selecting a method for function 'XStringSet': Error in isTRUEorFALSE(seek.first.rec) :
> #  argument "seek.first.rec" is missing, with no default
>
>
> ### workaround:
> myAln2 <- readDNAStringSet("temp.fa", format="fasta")
> myAln2 <- DNAMultipleAlignment(myAln2)
>
> sessionInfo()
>
> R version 3.1.0 Patched (2014-05-26 r65771)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
> [1] Biostrings_2.33.10  XVector_0.5.6       IRanges_1.99.16
> [4] S4Vectors_0.0.9     BiocGenerics_0.11.2
>
> loaded via a namespace (and not attached):
> [1] stats4_3.1.0    zlibbioc_1.11.1
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319