[BioC] from rat codelink to human locuslink

Steffen Durinck durincks at mail.nih.gov
Fri Nov 3 15:18:05 CET 2006


Hi Weiwei,

Unfortunately Ensembl currently doesn't have codelink identifiers for 
the rat dataset, it only holds these in the human dataset.
However if you have rat unigene identifiers there are two ways to get 
the corresponding human EntrezGene (= locuslink) identifiers.

Here's how to do it:

library(biomaRt)
rat = useMart("ensembl", dataset="rnorvegicus_gene_ensembl")
human = useMart("ensembl", dataset="hsapiens_gene_ensembl")
ratUnigene = c("Rn.107913","Rn.32316","Rn.112856")

#Now you can do the mapping using getBM in two steps using the human 
Ensembl gene identifiers as a way to go from rat to human
humanEnsemblId = getBM(c("unigene", "human_ensembl_gene"), 
filters="unigene", values=ratUnigene, rat)
humanEntrezGene = getBM(c("ensembl_gene_id", "entrezgene"), 
filters="ensembl_gene_id", values=humanEnsemblId[,2], human)

#or you could use the getHomolog function which does this in one step.  
However it will only return the EntrezGene ids so if you start from a 
list of #unigene identifiers you'll get a list of human entrezgene 
identifiers but you can not match them up unless you do it one by one. 
I'll see if I can make #getHomolog to return both the identifier you 
start from and the identifier you want to retrieve so you can easily 
match up things

getHomolog(id = ratUnigene, from.type="unigene", to.type="entrezgene", 
from.mart=rat, to.mart=human)

Hope this helps,
Steffen






Weiwei Shi wrote:
> yes, that's why i was confused b/c i checked some codelink and they
> start with GE. But i used that package and put the unigene id the data
> provides; some are recognized by biomaRt while some are not.
> I will try tomorrow (it is too late for today :(
>
> weiwei
>
> On 11/2/06, Diego Diez <diez at kuicr.kyoto-u.ac.jp> wrote:
>   
>> On Thu, 2 Nov 2006, Sean Davis wrote:
>>
>>     
>>> Weiwei Shi wrote:
>>>       
>>>> Hi,
>>>>
>>>> I have 3 examples like this:
>>>> Probe_ID        UniGene_ID      UniGene_Name
>>>> AA799301_PROBE1 Rn.107913       Lgtn protein (DBSS)
>>>> AA799313_PROBE1 Rn.32316        "sialyltransferase 10
>>>> (alpha-2,3-sialyltransferase VI)"
>>>> AA799329_PROBE1 Rn.112856       RIKEN cDNA 4632417K18 (Mm.) (DBSS)
>>>>
>>>> I think the UniGene_ID might work for the purpose of using biomartRt
>>>> package (is it what you meant by biomart?). But the thing is, I look
>>>> through the package intro but I did not find how to convert between
>>>> species. Should I choose dataset for rat first, and then use rat2human
>>>> conversion (i have a local program to do that but I am curious how
>>>> biomartRt or other packages in R do this?)
>>>>         
>>> Hi, Weiwei.  You'll probably want to look at the help pages for biomaRt
>>> (note the correct capitalization--sorry for the confusion).  To see a
>>> list of help pages, you can use the simple command:
>>>
>>>  > help(package=biomaRt)
>>>
>>> There are a couple of functions that look promising: getXref and
>>> getHomolog.  You might want to look into those a bit.
>>>
>>> As for your probe ID's, it looks like they are a concatenation of a
>>> Genbank accession number and "PROBE1", so those could be useful.
>>> Unigene ID could also potentially be useful, but that depends a bit on
>>> how old the annotation is, as Unigene IDs change and are deleted on a
>>> regular basis as part of each new unigene build.
>>>       
>> Actually, those probe ID's are not the currently used Codelink probe ID's
>> but the LEGACY_PROBE_NAME. The annotation packages found in Bioconductor
>> dont use this probe ids so they cannot be used to map to public
>> identifiers.  I wonder if it is also available the CUSTUMER_PROBE_NAME,
>> which has the form GExxxxx (x being numbers, like GE12209) and is the
>> identifier used by Codelink.
>>
>> D.
>>
>>     
>>> Sean
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>       
>>     
>
>
>   


-- 
Steffen Durinck, Ph.D.

Oncogenomics Section
Pediatric Oncology Branch
National Cancer Institute, National Institutes of Health
URL: http://home.ccr.cancer.gov/oncology/oncogenomics/

Phone: 301-402-8103
Address:
Advanced Technology Center,
8717 Grovemont Circle
Gaithersburg, MD 20877



More information about the Bioconductor mailing list