[BioC] from rat codelink to human locuslink

Steffen Durinck durincks at mail.nih.gov
Fri Nov 3 22:26:36 CET 2006


Hi Weiwei,

By default biomaRt runs in webservice mode.  Doing queries in a large 
loop in webservice mode do crash and in this case it is better to use 
the package in MySQL mode.  In webservice mode you could make your 
look-up table by doing just the two queries that I suggested in the 
first solution.

However there is an easier way to get what you want as the output of 
getHomolog, when using biomaRt in MySQL mode, does contain the query 
ids  (rat unigene ids) and the result (human entrezgene ids) so no need 
for time consuming big loops.

Try the following:

human = useMart("ensembl", dataset="hsapiens_gene_ensembl", mysql=TRUE)
rat = useMart("ensembl", dataset="rnorvegicus_gene_ensembl", mysql=TRUE)
ratUnigene = c("Rn.32316","Rn.171821")
getHomolog(id = ratUnigene, from.type="unigene", 
to.type="entrezgene",from.mart=rat, to.mart=human)

It should give:

       id MappedID
1  Rn.32316    10402
2 Rn.171821     7058

Note that Ensembl maps everything to the transcript level, which 
explains why you might find redundant information in the output.

Cheers,
Steffen

Weiwei Shi wrote:
> Hi, there:
>
> I like the getHomolog solution (since the first one seems not workable
> for me) but i need to do some modification since there is an issue
> like this
>> getHomolog(id=ratUnigene[5], from.type="unigene", to.type="entrezgene",
> + from.mart=rat, to.mart=human)
>               V1              V2    V3
> 1 ENSG00000095397 ENST00000362057 25861
> 2 ENSG00000095397 ENST00000265134    NA
> 3 ENSG00000095397 ENST00000361938 25861
> 4 ENSG00000095397 ENST00000374059    NA
> 5 ENSG00000095397 ENST00000374057    NA
>
> For one ratUnigene, there are five $V3.
> t1 <- sapply(ratUnigene, function(i) unique(getHomolog(id=i,
> from.type="unigene", to.type="entrezgene",
> from.mart=rat, to.mart=human)$V3)[1])
>
>> as.character(t1)
> [1] "NULL"   "10402"  "NULL"   "NULL"   "25861"  "8706"   "195827"
> [8] "NULL"   "NULL"   "NULL"   "NULL"   "NULL"   "55884"  "NULL"
> [15] "NULL"   "3898"   "23324"  "NULL"   "NULL"   "NULL"
>
> Of course, I assume, there are only the same id and NA for $V3.
>
> However, since I have ~7400 unigenes, it is supposed to end after 78
> min. However, I run into a connection issue:
>
>> system.time(t1 <- sapply(ratUnigene, function(i) 
>> unique(getHomolog(id=i, from.type="unigen
> e", to.type="entrezgene",from.mart=rat, to.mart=human)$V3)[1]))
> Error in postForm(paste(to.mart at host, "?", sep = ""), query = xmlQuery) :
>        couldn't connect to host
> In addition: There were 50 or more warnings (use warnings() to see the 
> first 50)
> Timing stopped at: 1.641 0.22 444.603 0 0
>
> So, I am wondering if there is a way to download a lookup table and do
> it locally. By the way, 78 minutes to do 7400 times' conversions.
>
>
>
> Weiwei


-- 
Steffen Durinck, Ph.D.

Oncogenomics Section
Pediatric Oncology Branch
National Cancer Institute, National Institutes of Health
URL: http://home.ccr.cancer.gov/oncology/oncogenomics/

Phone: 301-402-8103
Address:
Advanced Technology Center,
8717 Grovemont Circle
Gaithersburg, MD 20877



More information about the Bioconductor mailing list