[BioC] getHomolog in biomaRt

Steve Pederson stephen.pederson at student.adelaide.edu.au
Wed Apr 11 08:30:29 CEST 2007


Hi Steffen,

Thanks for the response & that sorted my problem out rather well. I had 
been using biomaRt 1.8.2.

Cheers,

Steve

Steffen Durinck wrote:
> Hi Steve,
> 
> Which version of biomaRt are you using?
> I would recommend using the devel version, as this will return both the 
> query id and it's homolog id.
> 
>  >human=useMart("ensembl", dataset="hsapiens_gene_ensembl")
>  >mouse = useMart("ensembl", dataset="mmusculus_gene_ensembl")
>  > getHomolog( id = c("66645","64058"), to.type = "entrezgene",from.type 
> = "entrezgene", from.mart = mouse, to.mart=human )
>     V1    V2
> 1 64058 64065
> 2 66645 55269
> 
> 
>  > sessionInfo()
> R version 2.4.0 (2006-10-03)
> x86_64-unknown-linux-gnu
> 
> locale:
> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C 
> 
> 
> attached base packages:
> [1] "methods"   "stats"     "graphics"  "grDevices" "utils"     "datasets"
> [7] "base"
> 
> other attached packages:
> biomaRt    RCurl      XML
> "1.9.22"  "0.8-0"  "1.4-1"
> 
> Cheers,
> Steffen
> 
> Steve Pederson wrote:
>> Hi,
>>
>> I'm still on a steep learning curve with R & am trying to convert a 
>> large batch of mouse entrezIDs to homologous human entrezID & when 
>> sending as a batch to biomaRt the search result doesn't contain the 
>> query string (is this possible as a suggestion for the next update?), 
>> so is unable to be matched to the original. For example:
>>
>>  > getHomolog( id = c("73663","66645","74855"), to.type = 
>> "entrezgene", from.type = "entrezgene", from.mart = mouse, 
>> to.mart=human )
>>       V1
>> 1 55269
>>
>> As a result, I'm sending one at a time via a quick function that I set 
>> up. The batch regularly seems to fail & I get the following error 
>> message:
>> Error in read.table(con, sep = "\t", header = FALSE, quote = "", 
>> comment.char = "",  :
>>          no lines available in input
>>
>> This is an example of the exact code that causes it:
>> library(biomaRt)
>> human <- useMart("ensembl","hsapiens_gene_ensembl")
>> mouse <- useMart("ensembl","mmusculus_gene_ensembl")
>> getHomolog( id = "380768", to.type = "entrezgene", from.type = 
>> "entrezgene", from.mart = mouse, to.mart=human )
>>
>> The response is not NULL, as my code is set up to handle this response.
>>
>> My main question is, does anyone know how do I stop the loop aborting 
>> when I receive this error message, which I think is external? If I can 
>> record which specific IDs are causing the error, I could exclude them 
>> from the original batch, but the error-handling is a bit murky to my 
>> reading in the R help. My actual function is included below 
>> (biomaRt.conversion).
>>
>> Unfortunately, I don't have any MySQL experience (yet) so that isn't 
>> an option for me as an alternative.
>>
>> The list is derived from those unable to be matched from 
>> ProbeMatchDB2.0, as that database maps via Unigene
>> http://brainarray.mbni.med.umich.edu/Brainarray/Database/ProbeMatchDB/ncbi_probmatch_para_step1.asp 
>>
>>
>> Thanks,
>>
>> Steve
>>
>>
>>
>> biomaRt.conversion <- function(x,from.id,to.id,from.sp,to.sp)
>>    {
>>      # x is the initial list of ids
>>      # from.id & to.id are the type of codes (e.g entrez or unigene)
>>      # from.mart & to.mart can only be human or mouse
>>      # Warnings will need to be suppressed in the case of no match 
>> existing
>>      homologs <- c()
>>      no.homolog <- c()
>>      if (from.sp=="human") mart1 
>> <-useMart("ensembl","hsapiens_gene_ensembl")
>>      if (to.sp=="human") mart2 <- 
>> useMart("ensembl","hsapiens_gene_ensembl")
>>      if (from.sp=="mouse") mart1 
>> <-useMart("ensembl","mmusculus_gene_ensembl")
>>      if (to.sp=="mouse") mart2 <- 
>> useMart("ensembl","mmusculus_gene_ensembl")
>>      for (i in 1:length(x))
>>        {
>>          suppressWarnings(hum <- getHomolog( id = x[i], to.type=to.id, 
>> from.type =from.id, from.mart = mart1, to.mart = mart2))
>>          if (is.null(hum)==FALSE) # if a homolog was found
>>            {
>>              #A duplicate removal stage
>>              if(dim(hum)[1]>1)
>>                {
>>                  j=1 # the first entry in hum to check for duplicates
>>                  k=dim(hum)[1]
>>                  while(j<k)
>>                    {
>>                      if(length(which(hum==hum[j]))>1)# if there is a 
>> duplicate
>>                        {
>>                          hum <- hum[-(which(hum==hum[j])[-1]),] 
>> #removes all the duplicates except the first
>>                          #reset the values
>>                          if(is.null(dim(hum)[1])==TRUE)
>>                            {
>>                              k=0 #this will exit the loop if "hum" is 
>> now a single value
>>                            }
>>                          else
>>                            {
>>                              k=dim(hum)[1]
>>                              j=j+1
>>                            }
>>                        }
>>                    }
>>                }
>>
>>              for (j in 1:length(hum))
>>                {
>>                  homologs <- rbind(homologs,c(x[i],hum[j]))
>>                }
>>
>>            }
>>          else #if no homolog was found
>>            {
>>              no.homolog <- c(no.homolog,x[i])
>>            }
>>        }
>>      colnames(homologs) <- 
>> c(paste(from.sp,"ID",sep="."),paste(to.sp,"ID",sep="."))
>>      list(homologs=data.frame(homologs),no.homolog=no.homolog)
>>    }
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>   
> 
>



More information about the Bioconductor mailing list