[BioC] how to go from an short read alignment file to a SNPs table for population genetic analysis

Mao Jianfeng jianfeng.mao at gmail.com
Mon Dec 6 15:54:37 CET 2010


Dear Bioconductor listers,

I am new to genomics and bioinformatics. In my current study, we have
sequenced the genomes of tens of accessions of a plant, using Illumina
next generation sequencer. The short reads of a specific accession
have been aligned to the reference. The SNPs and shor indels have been
predicted for a specific accession genome to the reference. we got the
data sets for SNPs like the following format (in text file, the column
names were listed, the accession name will not change for a specific
accession):

<accession name><chromosome><position><reference base><cons
base><quality><support><concordance><avg_hits>


But usually, we need to align all the accessions in the following
format for classical population genetic analysis:

<accessions><SNP_1><SNP_2><SNP_3><SNP_...>
accession_1, a,t,g,,,
accession_2, a,t,c,,,
accession_3, t,a,c,,,
accession_,,,,,,,,,,,,,

I need to get helps, suggestions on how to do this format conversion,
or if there are any alternative choices for me, by using R and
bioconductor? If it need database operations, and how to do that?

Thanks in advance.

-- 
Jian-Feng, Mao



More information about the Bioconductor mailing list