[BioC] Obtaining exon structure of a gene via Bioconductor

Michael Dondrup michael.dondrup at uni.no
Wed Feb 3 13:18:49 CET 2010


Hi James and Ruppert,
yes, I clicked on the URL button. I actually like the idea of trying out  queries in the web interface first, because I 
find it quite intuitive as you can see the available parameters that can differ between several Databases at a glance. I am 
sorry, I should maybe have explained that a bit better.
 It just came to my mind that it would be great, if Biomart in addition to
a Perl, XML and URL button also had an R button that will provide the query in terms of biomaRt. Maybe this could be done
by conversion from the XML output? Just an idea.

Michael


Am Feb 2, 2010 um 8:13 PM schrieb James W. MacDonald:

> Hi Ruppert,
> 
> Ruppert Valentino wrote:
>> Thanks Michael
>> This looks great. I wonder if you could direct me to a page that
>> explains the database schema that ensembl uses as I am interested in
>> human genes and not sure what to put in the query to get say human
>> TP53 gene exonic sequences?
> 
> You don't need to know anything about the database schema to query Biomart. You can just go to the martview page
> 
> http://www.biomart.org/biomart/martview/
> 
> and then go through the GUI and make your selections. I assume Michael did that and then clicked on the 'URL' button to get the URI that he sent you.
> 
> Alternatively, and probably easier in the long run is to use biomaRt. Your query is quite simple:
> 
> library(biomaRt)
> mart <- useMart("ensembl","hsapiens_gene_ensembl")
> seqs <- getBM("gene_exon","hgnc_symbol","TP53", mart)
> 
> You can also add other things like the Ensembl transcript ID to the output by simply appending to the first argument (the attributes argument) like thus:
> 
> seqs <- getBM(c("ensembl_transcript_id", "gene_exon"), "hgnc_symbol", "TP53", mart)
> 
> You can also do multiple gene symbols at one time as well. If you need to do many genes, do them all at once and parse the resulting data.frame. In that case you are advised to add hgnc_symbol to the attributes as well, as the returned data are not necessarily sorted in the way you might expect.
> 
> Best,
> 
> Jim
> 
> 
>> thanks
>>> Subject: Re: [BioC] Obtaining exon structure of a gene via
>>> Bioconductor From: Michael.Dondrup at uni.no Date: Tue, 2 Feb 2010
>>> 17:41:39 +0100 CC: bioconductor at stat.math.ethz.ch To:
>>> ruppert7 at hotmail.com
>>> Hi, this is also possible with biomart and therefore also with
>>> biomaRt. The following query gives an example. Fetches all exon
>>> sequences for C. elegans Gene with ensembl-geneid T24D1.1 in fasta
>>> format. (try this url)
>>> http://www.biomart.org/biomart/martview?VIRTUALSCHEMANAME=default&ATTRIBUTES=celegans_gene_ensembl.default.sequences.ensembl_gene_id|celegans_gene_ensembl.default.sequences.ensembl_transcript_id|celegans_gene_ensembl.default.sequences.gene_exon&FILTERS=celegans_gene_ensembl.default.filters.ensembl_gene_id."T24D1.1"|celegans_gene_ensembl.default.filters.biotype."protein_coding"&VISIBLEPANEL=resultspanel
>>> If you like this, parameters can be almost directly translated into
>>> the the corresponding query in biomaRt although I don't think this
>>> is necessary for this case.
>>> Best Michael
>>> Am Feb 2, 2010 um 5:08 PM schrieb Ruppert Valentino:
>>>> Hello,
>>>> I want to do heteroduplex on each exon of around 50 genes.
>>>> Getting the exon structure for each gene from Ensembl and
>>>> manually identifying the exon sequence seems very laborous.
>>>> Is there a way using Bioconductor package to get the exon
>>>> sequences for all the transcripts of a gene, if so how can I do
>>>> this, would biomaRt do it, if so how?
>>>> Anyway examples of a script or ideas is greatly appreciated as it
>>>> takes hours to get all the exon sequences for a gene split up
>>>> into files to use for PCR.
>>>> thanks in advance for any help on this.
>>>> Raphael
>>>> _________________________________________________________________
>>>> Tell us your greatest, weirdest and funniest Hotmail stories
>>>> [[alternative HTML version deleted]]
>>>> _______________________________________________ Bioconductor
>>>> mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>>>> archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> _________________________________________________________________ Got a cool Hotmail story? Tell us now
>> [[alternative HTML version deleted]]
>> _______________________________________________ Bioconductor mailing
>> list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>> archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> -- 
> James W. MacDonald, M.S.
> Biostatistician
> Douglas Lab
> University of Michigan
> Department of Human Genetics
> 5912 Buhl
> 1241 E. Catherine St.
> Ann Arbor MI 48109-5618
> 734-615-7826
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 



More information about the Bioconductor mailing list