[BioC] Help with the terminology

Steve Lianoglou mailinglist.honeypot at gmail.com
Tue May 4 16:30:24 CEST 2010


Hi,

On Tue, May 4, 2010 at 10:05 AM,  <mauede at alice.it> wrote:
> TarBase identifies gene-targets through three identifiers, namely "HGNC_ID", "HGNC_Symbol", ENSGxxxxxxxxx",
> Which one of such  identifiers is more reliable (unique) to extract the 3UTR sequences from Ensembl  through BioMart
> function calls?

ENSGxxxxx stands for "Ensembl Gene ID", so if you're going to be using
Ensembl, then I'd imagine that might be the best.

One thing to note is that each gene can have several transcripts,
identified by ENSTxxxxxx. Given that different transcripts can have
different 3'UTRs, and that you're looking at miRNA-to-target
interactions (that usually happen in 3'UTRs), there's another
dimension you have to deal with.

> What is the difference between "HGNC_ID" and "Entrez Gene ID" ?

http://www.genenames.org/aboutHGNC.html


> Whenever I look for such names I found described the story and reasons why such names exist
> and the agency which is in charge of assigning them ... but I am missing their usage and difference with other
> *to my eyes* similar identifiers.

OK ...

> TarBase contains 1 miRNA called "Edited-miR-376a-5p". Could you please explain the meaning of this name and advice about the usage of such data ?
> Should I drop the "Edited-" substring and consider it as a regular Validated miRNA ?

The first hit from a google search of "Edited-miR-376a-5p" will get you here:
http://stke.sciencemag.org/cgi/content/full/sci;315/5815/1137

>From reading the abstract of that paper, I would imagine the "edited"
part is actually quite important/relevant information.

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list