[BioC] biomaRt: connection stopping

James W. MacDonald jmacdon at med.umich.edu
Wed Sep 13 17:44:00 CEST 2006


J.delasHeras at ed.ac.uk wrote:
> Hi,
> 
> I suspect this is something to do purely with my connection, but I 
> thought I'd ask, just in case:
> 
> I have a list of refseq ids (NM_xxxxx), 18028 of them.
> I wanted to get the gene symbols for those genes, so I used biomaRt on 
> the whole list. What I got was a single column data frame longer than 
> 18028, as I get multiple results with some of these refseq ids. There 
> doesn't seem to be an easy way to regroup them together, so I do the 
> following instead:

Using the RCurl interface for a big query like that isn't ideal. You 
would be better off installing RMySQL and using the MySQL interface 
(note: you can get RMySQL using biocLite(), thanks to the fine folks in 
Seattle). Also, you can have getBM() put things in a list, so any 
duplicated gene symbols will be grouped together.

A <- getBM("hgnc_symbol", "refseq_dna", RS, mart = mart, output = 
"list", mysql = TRUE)

Should do the trick.

HTH,

Jim


> 
> #create an empty list of teh right length
> A<-vector(mode="list", length=18028)
> #now loop filling elements of the list from the biomaRt queries
> for (i in 1:18028){
> K<-i
> A[[i]]<-getBM(attributes=c("hgnc_symbol"),mart=mart,filters="refseq_dna",values=c(RS[i]))
> }
> print(K)
> 
> RS is a vector containing the 18028 refseq ids.
> the K value is only so that I know where it breaks... because that's 
> what happens... after a while, it breaks with an error message:
> 
> Error in postForm(paste(mart at host, "?", sep = ""), query = xmlQuery) :
>          couldn't connect to host
> 
> This doesn't happen if I send the whole query in ONE go, in a vector... 
> but if I do it element by element it breaks after 3-4000 queries.
> Any ideas to do this in a simpler/better way? Or at least one that 
> doesn't have me coming back to re-start the loop at the position of the 
> last break?
> 
> thanks!
> 
> Jose
> 


-- 
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623


**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.



More information about the Bioconductor mailing list