[BioC] OTU delimitation from simple fasta (sanger) sequences

Martin Unterseher martin.unterseher at uni-greifswald.de
Thu Aug 16 14:30:12 CEST 2012


Dear all, 

I struggled with readOTUset{OTUbase} for some time, searched the web and r-archives including this one without success.

Whereas OTUbase is obviously designed for NGS datasets after passing specific 454 pipelines, I am searching a convenient method to delimit OTUs from a simple (sanger) sequence fasta file such as this one (fungal ITS sequences), with the possibility to specify e.g. sequence similarity of 97% over at least 90% length. 

The fasta header >VASmic02 says "sequence 02 from host plant VASmic. This example file consists of 10 sequences from 3 host plants VASmic, TILusn and HEVbra.

>VASmic02
ACCGGGATGTTCATAACCCTTTGTTGTCCGACTCTGTTGCCTCCGGGGCGACCCTGCCTTCGGGCGGGGGCTCCGGGTGGACACTTCAAACTCTTGCGTAACTTTGCAGTCTGAGTAAACTTAATTAATAAATTACACCACTCAAGCCTCGCTTGGTATTGGGCAACGCGGTCCGCCGCGTGCCTCAAATCGACCGGCTGGGTCTTCTGTCCCCTAAGCGTTGTGGAAACTATTCGCTAAAGGGTGTTCGGGAGGCTACGCCGTAAAACAACCCCATTTCTAAGG
>VASmic05
CCTCTTACCCATGTCTTTTGAGTACCTTCGTTTCCTCGGTGGGTTCGCCCGCCGATCGGACAACATTCAAACCCTTTGCAGTTGCAATCAGCGTCTGAAAAAACATAATAGTTAGTACCTTCAAGCTCTGCTTGGTGTTGGGTGTTTGTCTCGCCTCTGCGTGTAGACTCGCCTTAAAACAATTGGCAGCCGGCGTATTGATTTCGGAGCGCAGTACATCTCGCGCTTTGCACTCATAACGACGACGTCCAAAAGTACATTTTTAACTC
>VASmic06_1
TACCATCTCTTACCCATGTCTTTTGAGTACCTTCGTTTCCTCGGCGGGTCCGCCCGCCGATTGGACAAACTTAAACCCTTTGTAATTGAAATCAGCGTCTGAAAAAACATAATAGTTAGTACCTTCAAGCTCTGCTTGGTGTTGGGTGTTTGTCTCGCCTTTGCGTGTAGACTCGCCTTAAAACAATTGGCAGCCGGCGTATTGATTTCGGAGCGCAGTACATCTCGCGCTTTGCACTCATAACGACGACGTCCAAAAAGTACTTTTTACACTC
>TILusn11
TGTCTTTTGAGTACCTTCGTTTCCTCGGCGGGTCCGCCCGCCGATTGGACAAACTTAAACCCTTTGTAATTGAAATCAGCGTCTGAAAAAACATAATAGTTAGTACCTTCAAGCTCTGCTTGGTGTTGGGTGTTTGTCTCGCCTTTGCGTGTAGACTCGCCTTAAAACAATTGGCAGCCGGCGTATTGATTTCGGAGCGCAGTACATCTCGCGCTTTGCACTCATAACGACGACGTCCAAAAAGTACTTTTTACACTC
>TILusn12
GTATTATTACTTTGTTGCTTTGGCGAGCTGCCTTCGGGCCTTGTATGCTCGCCAGAGAATACCAAAACTCTTTTTATTAATGTCGTCTGAGTACTATATAATAGTTACAACCCTCAAGCTTAGCTTGGTATTGAGTCTATGTCAGTAATGGCAGGCTCTAAAATCAGTGGCGGCGCCGCTGGGTCCTGAACGTAGTAATATCTCTCGTTACAGGTTCTCGGTGTGCTTCTGCCAAAACCCAAATTTTTCTATGG
>VASmic14
ATCTCTTACCCATGTCTTTTGAGTACCTTCGTTTCCTCGGCGGGTCCGCCCGCCGATTGGACAAACTTAAACCCTTTGTAATTGAAATCAGCGTCTGAAAAAACATAATAGTTAGTACCTTCAAGCTCTGCTTGGTGTTGGGTGTTTGTCTCGCCTTTGCGTGTAGACTCGCCTTAAAACAATTGGCAGCCGGCGTATTGATTTCGGAGCGCAGTACATCTCGCGCTTTGCACTCATAACGACGACGTCCAAAAAGTACTTTTTACACTC
>VASmic16
CTCTTACCCATGTCTTTTGAGTACCTTCGTTTCCTCGGCGGGTCCGCCCGCCGGTTGGACAACATTCAAACCCTTTGCAGTTGCAATCAGCGTCTGAAAAAACTTAATAGTTAGTACCTTCAAGCTCTGCTTGGTGTTGGGTGTTTTGTCTCGCCTCCGCGCGCAGACTCGCCTTAAAACAATTGGCAGCCGGCGTATTGATTTCGGAGCGCAGTACATCTCGCGCTTTGCACTCATAACGACGACGTCCAAAAGTACATTTTTACACTC
>HEVbra17
ACCTCTTACCCATGTCTTTTGAGTACCTTCGTTTCCTCGGCGGGTCCGCCCGCCGATTGGACAACATTCAAACCCTTTGCAGTTGCAATCAGCGTCTGAAAAAACATAATAGTTAGTACCTTCAAGCTCTGCTTGGTGTTGGGTGTTTGTCTCTCCTCTGCGTGTAGACTCGCCTTAAAACAATTGGCAGCCGGCGTATTGATTTCGGAGCGCAGTACATCTCGCGCTTTGCACTCATAACGACGACGTCCAAAAGTACATTTTTACACTC
>HEVbra18
CTACCATCTCTTACCCATGTCTTTTGAGTACCTTCGTTTCCTCGGCGGGTCCGCCCGCCGATTGGACAAACTTAAACCCTTTGTAATTGAAATCAGCGTCTGAAAAAACATAATAGTTAGTACCTTCAAGCTCTGCTTGGTGTTGGGTGTTTGTCTCGCCTTTGCGTGTAGACTCGCCTTAAAACAATTGGCAGCCGGCGTATTGATTTCGGAGCGCAGTACATCTCGCGCTTTGCACTCATAACGACGACGTCCAAAAAGTACTTTTTACACTC
>VASmic21
ACCTTACCAAACTGTTGCCTCGGCGGGGTCACGCCCCGGGTGCGTCGCAGCCCCGGAACCAGGCGCCCGCCGGAGGGACCAACCAAACTCTTTCTGTAATCCCCTCGCGGACGTTATTTTTACAGCTCTGAGCAAAAATTCAAAATGAATCACAACCCTCGAACCCCTCCGGGGGTCCGGCGTTGGGGATCGGGAACCCCTAAGACGGGATCCCGGCCCCGAAATACAGTGGCGGTCTCGCCGCAGCCTCTCATGCGCAGTAGTTTGCACAACTCGCACCGGGAGCGCGGCGCGTCCACGTCCGTAAAACACCCAACTTCTGAAATG

There are surely several reasonable possibilities for an output, among others maybe this one (e.g. as data frame), which would allow subsequent diversity analyses with vegan, e.g. specaccum, metaMDS, etc. 

	VASmic	TILusn	HEVbra
OTU.01	4	1	1
OTU.02	2	1	0
OTU.03	0	0	1


Hoping that someone can help me with this.

Best
Martin

__________

PD Dr. Martin Unterseher
Universität Greifswald
Institut für Botanik und Landschaftsökologie
Grimmer Str. 88
17487 Greifswald

Tel. 03834 / 864184
Fax. 03834 / 864114

http://www.botanik.uni-greifswald.de/100.html



More information about the Bioconductor mailing list