[BioC] biomaRt issues

Wolfgang Huber whuber at embl.de
Tue Sep 8 16:57:27 CEST 2009


Dear Leonardo

Thank you for your clear and helpful problem report!

The lack of returned results (in one case), and the irreproducibility of 
returned results (in another case) seem to be a problem of the UniProt 
mart rather than of the biomaRt package per se. I cc Jie Luo at EBI who, 
afaIu, would be the most appropriate person to respond here, and perhaps 
help to localise and eliminate the problem.

	Best wishes
	Wolfgang


Leonardo Collado Torres ha scritto:
> Hello BioC users :)
> 
> I'm having some trouble with biomaRt with the uniprot database.
> 
> #I can execute the following code and everything works fine (with ENSEMBL):
> library(biomaRt)
> bsub <- useMart( "bacterial_mart_54", dataset = "bac_6_gene")
> res <- getBM( attributes=c("start_position", "end_position", "strand", 
> "status"), filters= c("start", "end"), values = list("1", "100000"), 
> mart = bsub)
> library(lattice)
> print(xyplot(end_position~start_position | status, group=strand, 
> data=res, auto.key=TRUE))
> 
> #But then, if I want to retrieve the EC numbers and organism info for 
> the viral proteins on Uniprot, this should work:
> # (I did it first through http://www.ebi.ac.uk/uniprot/biomart/martview 
> and it worked)
> library(biomaRt)
> uni <- useMart("uniprot_mart", dataset="UNIPROT")
> virus <- getBM(attributes = c("ec_number","organism"), filters = 
> "superregnum_name", values = "Viruses", mart = uni)
> dim(virus)
> [1] 0 2
> # But the virus object has 0 rows. The same happens if I use 
> checkFilters = FALSE
> # Using the website app, I do get information back.
> # If I check only the "organism" attribute, then I do get some information.
> virus2 <- getBM(attributes = c("organism"), filters = 
> "superregnum_name", values = "Viruses", mart = uni)
> dim(virus2)
> [1] 5063    1
> # However, I re did the "virus2" object a few minutes later and got a 
> different result (I checked around 4 times and got the same numbers):
> virus2 <- getBM(attributes = c("organism"), filters = 
> "superregnum_name", values = "Viruses", mart=uni)
> dim(virus2)
> [1] 158   1
> # Then once more after I typed the above lines on this mail, and I got 
> the same original result
> virus2 <- getBM(attributes = c("organism"), filters = 
> "superregnum_name", values = "Viruses", mart=uni)
> dim(virus2)
> [1] 5063    1
> # I'm pretty sure that I didn't lose my internet connection on the 
> meantime, so I don't really know what is causing this error.
> # I then tried the same lines on a different machine (different network 
> too) and at first I got the same 5063 row value, and then I got:
> virus2 <- getBM(attributes = c("organism"), filters = 
> "superregnum_name", values = "Viruses", mart=uni)
> dim(virus2)
> [1] 8431    1
> # Then 5063 again, etc.
> 
> In the end, 5063 seems to pop up more frequently, but is it the actual 
> result? Is there a way to make sure I'm not missing information without 
> calling getBM multiple times to check that there are no unexpected results?
> I had assigned some homework exercises using biomaRt to access Uniprot, 
> but now I'm confused myself about what's going on :P
> Any tips will be great :) Thanks!
> 
> Leonardo
> 
> 
> # First comp session info
> sessionInfo()
> R version 2.10.0 Under development (unstable) (2009-07-21 r48968)
> i386-pc-mingw32
> 
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United 
> States.1252  [3] LC_MONETARY=English_United States.1252 
> LC_NUMERIC=C                         [5] LC_TIME=English_United 
> States.1252  
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base   
> other attached packages:
> [1] lattice_0.17-25 biomaRt_2.1.0
> loaded via a namespace (and not attached):
> [1] grid_2.10.0  RCurl_0.98-1 XML_2.5-1 
> # Second comp session info
> sessionInfo()
> R version 2.10.0 Under development (unstable) (2009-08-10 r49131)
> sparc-sun-solaris2.9
> 
> locale:
> [1] C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
> [1] biomaRt_2.1.0
> 
> loaded via a namespace (and not attached):
> [1] RCurl_1.2-0 XML_2.6-0
> 

-- 

Best wishes
      Wolfgang

-------------------------------------------------------
Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber



More information about the Bioconductor mailing list