[BioC] Biostring: print sequence alignment to file

Hervé Pagès hpages at fhcrc.org
Tue Apr 17 21:43:30 CEST 2012


Hi Thomas,

On 04/17/2012 11:49 AM, Thomas Girke wrote:
> What about providing an option in pairwiseAlignment to output to the
> MultipleAlignment class in Biostrings and then write the latter to
> different alignment formats?

Or we could provide coercion methods to switch between
PairwiseAlignedXStringSet and MultipleAlignment.

Anyway that kind of moves Martin's problem from having a
write.PairwiseAlignedXStringSet() function that produces BLAST output
to having a write.MultipleAlignment() function that produces BLAST
output. For the specific case of BLAST output, would it make sense
to support it for MultipleAlignment? Can someone point me to an example
of such output? Or even better, to the specs of such format?

Note that right now there is the write.phylip() function in Biostrings
for writing a MultipleAlignment object to a file but the Phylip format
looks very different from the BLAST output:

hpages at latitude:~$ head -n 20 phylip_test.txt
  9 2343
Mask      0000000000 0000000000 0000000000 0000000000 0000000000
Human     -----TCCCG TCTCCGCAGC AAAAAAGTTT GAGTCGCCGC TGCCGGGTTG
Chimp     ---------- ---------- ---------- ---------- ----------
Cow       ---------- ---------- ---------- ---------- ----------
Mouse     ---------- ---------- --AAAAGTTG GAGTCTTCGC TTGAGAGTTG
Rat       ---------- ---------- ---------- ---------- ----------
Dog       ---------- ---------- ---------- ---------- ----------
Chicken   ---------- ----CGGCTC CGCAGCGCCT CACTCGCGCA GTCCCCGCGC
Salmon    GGGGGAGACT TCAGAAGTTG TTGTCCTCTC CGCTGATAAC AGTTGAGATG

           0000000000 0000000000 0000000000 0001111111 1111111111
           CCAGCGGAGT CGCGCGTCGG GAGCTACGTA GGGCAGAGAA GTCA-TGGCT
           ---------- ---------- ---------- ---------- ---A-TGGCT
           ---------- ---------- ---------- ---GAGAGAA GTCA-TGGCT
           CCAGCGGAGT CGCGCGCCGA CAGCTACGCG GCGCAGA-AA GTCA-TGGCT
           ---------- ---------- ---------- ---------- ---A-TGGCT
           ---------- ---------- ---------- ---------- ---A-TGGCT
           AGGGCCGGGC AGAGGCGCAC GCAGCTCCCC GGGCGGCCCC GCTC-CAGCC
           CGCATATTAT TATTACCTTT AGGACAAGTT GAATGTGTTC GTCAACATCT

Thanks!
H.

>
> Thomas
>
> On Tue, Apr 17, 2012 at 05:59:24PM +0000, Hervé Pagès wrote:
>> Hi Martin,
>>
>> On 04/16/2012 04:06 AM, Martin Preusse wrote:
>>> Hi Charles,
>>>
>>> thanks! Your solution allows to print the two alignment strings separately.
>>>
>>> I was thinking of an output as generated by alignment tools:
>>>
>>> AGT-TCTAT
>>> | | | |  |  | | | |
>>> AGTATCTAT
>>
>> This looks like BLAST output. Is this what you have in mind? Note that
>> there are many alignment tools and many ways to output the result to a
>> file. I'm not really familiar with the BLAST output format. Is it
>> specified somewhere? Would that make sense to add something like a
>> write.PairwiseAlignedXStringSet() function to Biostrings for writing
>> the result of pairwiseAlignment() to a file? We could do this and
>> support the BLAST format if that's a commonly used format.
>>
>> Thanks,
>> H.
>>
>>>
>>> For this I would have to write a function to output the strings in blocks of e.g. 60 nucleotides, right?
>>>
>>> Cheers
>>> Martin
>>>
>>>
>>>
>>> Am Freitag, 13. April 2012 um 19:21 schrieb Chu, Charles:
>>>
>>>> write.XStringSet
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fhcrc.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list