[BioC] getHomolog in biomaRt

Steffen Durinck durincks at mail.nih.gov
Tue Apr 10 17:17:38 CEST 2007


Hi Steve,

Which version of biomaRt are you using?
I would recommend using the devel version, as this will return both the 
query id and it's homolog id.

 >human=useMart("ensembl", dataset="hsapiens_gene_ensembl")
 >mouse = useMart("ensembl", dataset="mmusculus_gene_ensembl")
 > getHomolog( id = c("66645","64058"), to.type = "entrezgene",from.type 
= "entrezgene", from.mart = mouse, to.mart=human )
     V1    V2
1 64058 64065
2 66645 55269


 > sessionInfo()
R version 2.4.0 (2006-10-03)
x86_64-unknown-linux-gnu

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] "methods"   "stats"     "graphics"  "grDevices" "utils"     "datasets"
[7] "base"

other attached packages:
 biomaRt    RCurl      XML
"1.9.22"  "0.8-0"  "1.4-1"

Cheers,
Steffen

Steve Pederson wrote:
> Hi,
>
> I'm still on a steep learning curve with R & am trying to convert a 
> large batch of mouse entrezIDs to homologous human entrezID & when 
> sending as a batch to biomaRt the search result doesn't contain the 
> query string (is this possible as a suggestion for the next update?), so 
> is unable to be matched to the original. For example:
>
>  > getHomolog( id = c("73663","66645","74855"), to.type = "entrezgene", 
> from.type = "entrezgene", from.mart = mouse, to.mart=human )
>       V1
> 1 55269
>
> As a result, I'm sending one at a time via a quick function that I set 
> up. The batch regularly seems to fail & I get the following error message:
> Error in read.table(con, sep = "\t", header = FALSE, quote = "", 
> comment.char = "",  :
>          no lines available in input
>
> This is an example of the exact code that causes it:
> library(biomaRt)
> human <- useMart("ensembl","hsapiens_gene_ensembl")
> mouse <- useMart("ensembl","mmusculus_gene_ensembl")
> getHomolog( id = "380768", to.type = "entrezgene", from.type = 
> "entrezgene", from.mart = mouse, to.mart=human )
>
> The response is not NULL, as my code is set up to handle this response.
>
> My main question is, does anyone know how do I stop the loop aborting 
> when I receive this error message, which I think is external? If I can 
> record which specific IDs are causing the error, I could exclude them 
> from the original batch, but the error-handling is a bit murky to my 
> reading in the R help. My actual function is included below 
> (biomaRt.conversion).
>
> Unfortunately, I don't have any MySQL experience (yet) so that isn't an 
> option for me as an alternative.
>
> The list is derived from those unable to be matched from 
> ProbeMatchDB2.0, as that database maps via Unigene
> http://brainarray.mbni.med.umich.edu/Brainarray/Database/ProbeMatchDB/ncbi_probmatch_para_step1.asp
>
> Thanks,
>
> Steve
>
>
>
> biomaRt.conversion <- function(x,from.id,to.id,from.sp,to.sp)
>    {
>      # x is the initial list of ids
>      # from.id & to.id are the type of codes (e.g entrez or unigene)
>      # from.mart & to.mart can only be human or mouse
>      # Warnings will need to be suppressed in the case of no match existing
>      homologs <- c()
>      no.homolog <- c()
>      if (from.sp=="human") mart1 
> <-useMart("ensembl","hsapiens_gene_ensembl")
>      if (to.sp=="human") mart2 <- useMart("ensembl","hsapiens_gene_ensembl")
>      if (from.sp=="mouse") mart1 
> <-useMart("ensembl","mmusculus_gene_ensembl")
>      if (to.sp=="mouse") mart2 <- 
> useMart("ensembl","mmusculus_gene_ensembl")
>      for (i in 1:length(x))
>        {
>          suppressWarnings(hum <- getHomolog( id = x[i], to.type=to.id, 
> from.type =from.id, from.mart = mart1, to.mart = mart2))
>          if (is.null(hum)==FALSE) # if a homolog was found
>            {
>              #A duplicate removal stage
>              if(dim(hum)[1]>1)
>                {
>                  j=1 # the first entry in hum to check for duplicates
>                  k=dim(hum)[1]
>                  while(j<k)
>                    {
>                      if(length(which(hum==hum[j]))>1)# if there is a 
> duplicate
>                        {
>                          hum <- hum[-(which(hum==hum[j])[-1]),] #removes 
> all the duplicates except the first
>                          #reset the values
>                          if(is.null(dim(hum)[1])==TRUE)
>                            {
>                              k=0 #this will exit the loop if "hum" is 
> now a single value
>                            }
>                          else
>                            {
>                              k=dim(hum)[1]
>                              j=j+1
>                            }
>                        }
>                    }
>                }
>
>              for (j in 1:length(hum))
>                {
>                  homologs <- rbind(homologs,c(x[i],hum[j]))
>                }
>
>            }
>          else #if no homolog was found
>            {
>              no.homolog <- c(no.homolog,x[i])
>            }
>        }
>      colnames(homologs) <- 
> c(paste(from.sp,"ID",sep="."),paste(to.sp,"ID",sep="."))
>      list(homologs=data.frame(homologs),no.homolog=no.homolog)
>    }
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>   


-- 
Steffen Durinck, Ph.D.

Oncogenomics Section
Pediatric Oncology Branch
National Cancer Institute, National Institutes of Health
URL: http://home.ccr.cancer.gov/oncology/oncogenomics/

Phone: 301-402-8103
Address:
Advanced Technology Center,
8717 Grovemont Circle
Gaithersburg, MD 20877



More information about the Bioconductor mailing list