[BioC] biomaRt: connection stopping

J.delasHeras at ed.ac.uk J.delasHeras at ed.ac.uk
Wed Sep 13 17:20:50 CEST 2006


Hi,

I suspect this is something to do purely with my connection, but I 
thought I'd ask, just in case:

I have a list of refseq ids (NM_xxxxx), 18028 of them.
I wanted to get the gene symbols for those genes, so I used biomaRt on 
the whole list. What I got was a single column data frame longer than 
18028, as I get multiple results with some of these refseq ids. There 
doesn't seem to be an easy way to regroup them together, so I do the 
following instead:

#create an empty list of teh right length
A<-vector(mode="list", length=18028)
#now loop filling elements of the list from the biomaRt queries
for (i in 1:18028){
K<-i
A[[i]]<-getBM(attributes=c("hgnc_symbol"),mart=mart,filters="refseq_dna",values=c(RS[i]))
}
print(K)

RS is a vector containing the 18028 refseq ids.
the K value is only so that I know where it breaks... because that's 
what happens... after a while, it breaks with an error message:

Error in postForm(paste(mart at host, "?", sep = ""), query = xmlQuery) :
         couldn't connect to host

This doesn't happen if I send the whole query in ONE go, in a vector... 
but if I do it element by element it breaks after 3-4000 queries.
Any ideas to do this in a simpler/better way? Or at least one that 
doesn't have me coming back to re-start the loop at the position of the 
last break?

thanks!

Jose

-- 
Dr. Jose I. de las Heras                      Email: J.delasHeras at ed.ac.uk
The Wellcome Trust Centre for Cell Biology    Phone: +44 (0)131 6513374
Institute for Cell & Molecular Biology        Fax:   +44 (0)131 6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK



More information about the Bioconductor mailing list