[BioC] R: is there an identifier that uniquely identifies a gene all over the many databases ?

Steve Lianoglou mailinglist.honeypot at gmail.com
Mon Jul 13 01:52:30 CEST 2009


Hi,

> My goal is to get the 3UTR sequence associated to experimentally  
> validated genes.
> Through entering "Human" species and  miRNA identifier "hsa-miR-yyy"  
> TarBase interface returns a
> list of all gene ENSGxxxxxx that have been experimentally tested.
> I input such ENSGxxxxxx identifier to getSequence (BioMat  function)  
> to get the 3UTRr sequence.
> I was surprised to find multiple 3UTR sequences associated to the  
> same ENSGxxxxxx.
> Maybe each transcript is identified by a unique ENSTxxxx  
> identifier... TRUE/FALSE ?

That's likely the case, but you can easily verify this yourself.

Just add "ensembl_transcript_id" (in addition to the ensembl_gene_id  
you already have) as one of the attributes you'd like returned in your  
getBM query to see if that explains the multiple 3_utr_start/end  
results you get.

-steve

--
Steve Lianoglou
Graduate Student: Physiology, Biophysics and Systems Biology
Weill Medical College of Cornell University

Contact Info: http://cbio.mskcc.org/~lianos



More information about the Bioconductor mailing list