[BioC] Problem with getBM function in biomaRt package

Steffen Durinck durincks at mail.nih.gov
Wed Jul 5 16:31:43 CEST 2006


Hi Luo,

If you request a list as output then biomaRt will do in your case 8000 
separate queries to the server.  This is not well suited for large query 
vectors.  Have you tried to use biomaRt with the default output (a 
data.frame)?

genelist=getBM(attributes =
c("hgnc_symbol","description"), filter =
"entrezgene",values = igenes, mart = mart)


You should have no problems querying > 8000 ids when using the default 
output.

If you do need a list output and have many ids then I would recommend 
using biomaRt RMySQL mode.

Best,
Steffen

Luo Weijun wrote:
> Hello all, 
> I am trying to get gene symbols and full gene names
> (description) for a long list of (>=8000) genes. I use
> getBM function in biomaRt package. And the code is
> pretty much the same as Jim¡¯s ¡®HowTo: get pretty
> HTML output for my gene list¡¯ vignette. Everything
> works fine when I use a much shorter list (100 genes),
> i.e. igenes= hs95av2Entrezg7[1:100] in the following
> codes. But when igene= hs95av2Entrezg7 (full gene
> list), getBM doesn¡¯t work, and returns an error
> message. 
>
>   
>> library(biomaRt)
>>     
> Loading required package: XML
> Loading required package: RCurl
>   
>> mart <- useMart("ensembl", "hsapiens_gene_ensembl")
>>     
> Checking attributes and filters ... ok
>   
> load('/Users/luow/project/microarraydata/annotation/hs95av2Entrezg7.Rdata')
>   
>> igenes=hs95av2Entrezg7
>>     
> <escription"), filter = "entrezgene",values = igenes,
> mart = mart, output = "list",na.value ='')            
>                              
> ##(note here my orginal input is:
>  genelist=getBM(attributes =
> c("hgnc_symbol","description"), filter =
> "entrezgene",values = igenes, mart = mart, output =
> "list",na.value ='')
> ##and this long line is truncated in the terminal
> screen somehow)
> Error in postForm(paste(mart at host, "?", sep = ""),
> query = xmlQuery) : 
>         couldn't connect to host
>   
>
> Since Jim also suggests that RMySQL is much faster
> than RCurl, I also tried to install RMySQL package,
> but the error messages says there is no such package,
> even though I did see RMySQL is there in the
> contributed package list in all mirror sites of CRAN I
> tried. Not sure what is the problem.
>
>   
>> install.packages('RMySQL', repos =
>>     
> "http://www.biometrics.mtu.edu/CRAN/")      
> Warning in download.packages(pkgs, destdir = tmpd,
> available = available,  : 
>          no package 'RMySQL' at the repositories
>   
>
> Here is my session info 
>   
>> sessionInfo()
>>     
> Version 2.3.1 (2006-06-01) 
> powerpc-apple-darwin8.6.0 
>
> attached base packages:
> [1] "methods"   "stats"     "graphics"  "grDevices"
> "utils"     "datasets" 
> [7] "base"     
>
> other attached packages:
>  biomaRt    RCurl      XML 
>  "1.6.0"  "0.6-2" "0.99-7" 
>   
>
> I actually can¡¯t even do sessionInfo after the getBM
> line got broken.
>   
>> sessionInfo()
>>     
> Error in gzfile(file, "rb") : unable to open
> connection
> In addition: Warning messages:
> 1: list.files:
> '/Library/Frameworks/R.framework/Resources/library' is
> not a readable directory 
> 2: cannot open compressed file
> '/Library/Frameworks/R.framework/Resources/library/biomaRt/Meta/package.rds'
>
>   
>
> Thank you so much for your kind help!
> Weijun
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list