[BioC] Obtaining exon structure of a gene via Bioconductor

James W. MacDonald jmacdon at med.umich.edu
Tue Feb 2 20:13:08 CET 2010


Hi Ruppert,

Ruppert Valentino wrote:
> Thanks Michael
> 
> This looks great. I wonder if you could direct me to a page that
> explains the database schema that ensembl uses as I am interested in
> human genes and not sure what to put in the query to get say human
> TP53 gene exonic sequences?

You don't need to know anything about the database schema to query 
Biomart. You can just go to the martview page

http://www.biomart.org/biomart/martview/

and then go through the GUI and make your selections. I assume Michael 
did that and then clicked on the 'URL' button to get the URI that he 
sent you.

Alternatively, and probably easier in the long run is to use biomaRt. 
Your query is quite simple:

library(biomaRt)
mart <- useMart("ensembl","hsapiens_gene_ensembl")
seqs <- getBM("gene_exon","hgnc_symbol","TP53", mart)

You can also add other things like the Ensembl transcript ID to the 
output by simply appending to the first argument (the attributes 
argument) like thus:

seqs <- getBM(c("ensembl_transcript_id", "gene_exon"), "hgnc_symbol", 
"TP53", mart)

You can also do multiple gene symbols at one time as well. If you need 
to do many genes, do them all at once and parse the resulting 
data.frame. In that case you are advised to add hgnc_symbol to the 
attributes as well, as the returned data are not necessarily sorted in 
the way you might expect.

Best,

Jim


> 
> thanks
> 
> 
> 
>> Subject: Re: [BioC] Obtaining exon structure of a gene via
>> Bioconductor From: Michael.Dondrup at uni.no Date: Tue, 2 Feb 2010
>> 17:41:39 +0100 CC: bioconductor at stat.math.ethz.ch To:
>> ruppert7 at hotmail.com
>> 
>> Hi, this is also possible with biomart and therefore also with
>> biomaRt. The following query gives an example. Fetches all exon
>> sequences for C. elegans Gene with ensembl-geneid T24D1.1 in fasta
>> format. (try this url)
>> 
>> http://www.biomart.org/biomart/martview?VIRTUALSCHEMANAME=default&ATTRIBUTES=celegans_gene_ensembl.default.sequences.ensembl_gene_id|celegans_gene_ensembl.default.sequences.ensembl_transcript_id|celegans_gene_ensembl.default.sequences.gene_exon&FILTERS=celegans_gene_ensembl.default.filters.ensembl_gene_id."T24D1.1"|celegans_gene_ensembl.default.filters.biotype."protein_coding"&VISIBLEPANEL=resultspanel
>> 
>> 
>> If you like this, parameters can be almost directly translated into
>> the the corresponding query in biomaRt although I don't think this
>> is necessary for this case.
>> 
>> Best Michael
>> 
>> Am Feb 2, 2010 um 5:08 PM schrieb Ruppert Valentino:
>> 
>>> 
>>> 
>>> Hello,
>>> 
>>> 
>>> 
>>> I want to do heteroduplex on each exon of around 50 genes.
>>> Getting the exon structure for each gene from Ensembl and
>>> manually identifying the exon sequence seems very laborous.
>>> 
>>> 
>>> 
>>> Is there a way using Bioconductor package to get the exon
>>> sequences for all the transcripts of a gene, if so how can I do
>>> this, would biomaRt do it, if so how?
>>> 
>>> 
>>> 
>>> Anyway examples of a script or ideas is greatly appreciated as it
>>> takes hours to get all the exon sequences for a gene split up
>>> into files to use for PCR.
>>> 
>>> 
>>> 
>>> thanks in advance for any help on this.
>>> 
>>> 
>>> 
>>> Raphael
>>> 
>>> _________________________________________________________________
>>>  Tell us your greatest, weirdest and funniest Hotmail stories
>>> 
>>> [[alternative HTML version deleted]]
>>> 
>>> _______________________________________________ Bioconductor
>>> mailing list Bioconductor at stat.math.ethz.ch 
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>>> archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> 
>  _________________________________________________________________ 
> Got a cool Hotmail story? Tell us now
> 
> [[alternative HTML version deleted]]
> 
> _______________________________________________ Bioconductor mailing
> list Bioconductor at stat.math.ethz.ch 
> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
> archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 



More information about the Bioconductor mailing list