[BioC] Obtaining exon structure of a gene via Bioconductor

Steve Lianoglou mailinglist.honeypot at gmail.com
Tue Feb 2 17:35:03 CET 2010


Hi,

On Tue, Feb 2, 2010 at 11:08 AM, Ruppert Valentino <ruppert7 at hotmail.com> wrote:
> Hello,
>
> I want to do heteroduplex on each exon of around 50 genes. Getting the exon structure for each gene from Ensembl and manually identifying the exon sequence seems very laborous.
>
> Is there a way using Bioconductor package to get the exon sequences for all the transcripts of a gene, if so how can I do this, would biomaRt do it, if so how?
>
> Anyway examples of a script or ideas is greatly appreciated as it takes hours to get all the exon sequences for a gene split up into files to use for PCR.
>
> thanks in advance for any help on this.

I'm not sure that it really takes hours to get the exon structure ...
I've actually been developing and using a package to do this:

http://wiki.github.com/lianos/GenomeAnnotations

I'm not necessarily recommending that you use this package, but I
outlined the steps you could take to download the refseq gene
annotations for mm9, here:

http://wiki.github.com/lianos/GenomeAnnotations/installing-annotation-packages

In the "Downloading the Gene Annotation File" section.

You'll get a tab delimited file. 1 line per transcript. There are
exonStart and exonEnd columns that are comma separated list of numbers
that have the information you're looking for.

If you only want a few genes, then parsing that file shouldn't be too bad ...

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list