[BioC] Help using biomaRt

steffen at stat.Berkeley.EDU steffen at stat.Berkeley.EDU
Thu Jan 29 04:44:00 CET 2009


Hi John,

You can do:

> library(biomaRt)
Loading required package: RCurl
> mart = useMart("ensembl", dataset="hsapiens_gene_ensembl")
Checking attributes and filters ... ok
> att=listAttributes(mart, category="Features")
> att[1:10,]
                  name         description
1         affy_hc_g110        Affy HC G110
2        affy_hg_focus       Affy HG FOCUS
3  affy_hg_u133_plus_2 Affy HG U133-PLUS-2
4        affy_hg_u133a       Affy HG U133A
5      affy_hg_u133a_2     Affy HG U133A_2
6        affy_hg_u133b       Affy HG U133B
7         affy_hg_u95a        Affy HG U95A
8       affy_hg_u95av2      Affy HG U95AV2
9         affy_hg_u95b        Affy HG U95B
10        affy_hg_u95c        Affy HG U95C

To get an overview of the possible categories you can do

> attributeSummary(mart)
     category                         group
1    Features                     EXTERNAL:
2    Features                   EXPRESSION:
3    Features                         GENE:
4    Features              PROTEIN DOMAINS:
5    Homologs              AEDES ORTHOLOGS:

This lists for example all attributes that are in the Features
category(page).
This is a bit easier to interpret on the web browser based BioMart
interface (try one on http://www.biomart.org).  There, attributes are
grouped in different web pages so there is no possibility for users to
select attributes from different pages as only attributes from one page
are visible.

The dev version of biomaRt now checks at the R side if all attributes are
from the same page and will provide a warning before sending out the query
to the BioMart web service.  Note also that in the dev version we use the
term page instead of category in order to be more compatible with the
BioMart nomenclature.

Cheers,
Steffen


>
> Hi All
>
> And thanks to all those that have helped.
>
> My biomart query is now working fine and I can get what I want with two
> queries, the second query using a different filter.
>
> But I still feel I built my query by trial and error rather than in an
> intelligent way. I have used Steffen's suggestion to use listAttributes
> to understand the attributes grouping/category but that does not seem to
> help me much.
>
>
> So the question is how could I have avoided blundering into this error:
>
> #1 Query ERROR: caught BioMart::Exception::Usage: Attributes from
> multiple attribute pages are not allowed
>
>
> Here is an example of me going wrong:
>
> library(biomaRt)
>
>
> # Connect to the mart
> mart<-useMart("ensembl")
> # Update the mart connection to use this dataset
> mart<-useDataset("mmusculus_gene_ensembl", mart=mart)
>
> # Just select a couple of affyids for testing
> affyids<-c("5325539", "5555964")
>
>
> # Build vector of filters
> filtnames<-"affy_moex_1_0_st_v1"
> # Build vector of attributes
> attnames<-c("affy_moex_1_0_st_v1", "ensembl_gene_id",
> "canonical_transcript_stable_id")
> # Submit biomart query
> results1<-getBM(attnames, filters=filtnames, values=affyids, mart=mart)
>
> # Great, that works.
> # But I would like to get some exon information as well
> # Thinks - how?
> # Check the attributes categories/groups:
> #listAttributes(mart, showGroups = TRUE)[c(9, 17, 28, 30, 678, 682),]
> #                              name                       description
> group   category
> #9              affy_moex_1_0_st_v1                         Affy MoEx
> EXTERNAL:   Features
> #17  canonical_transcript_stable_id Canonical transcript stable ID(s)
> GENE:   Features
> #28                 ensembl_gene_id                   Ensembl Gene ID
> GENE:   Features
> #30           ensembl_transcript_id             Ensembl Transcript ID
> GENE:   Features
> #678                ensembl_exon_id                   Ensembl Exon ID
> EXON: Structures
> #682                           rank           Exon Rank in Transcript
> EXON: Structures
>
> # Oh dear. Looks like exon information is in EXON: Structures
> # I guess that will not work ...
> # But I do not know what I am doing, so try adding exon rank anyway.
>
> # Build vector of attributes
> attnames<-c("affy_moex_1_0_st_v1", "ensembl_gene_id",
> "canonical_transcript_stable_id", "rank")
> # Submit biomart query
> results2<-getBM(attnames, filters=filtnames, values=affyids, mart=mart)
>
> # Great, that works! So add the exon id attribute.
>
> # Build vector of attributes
> attnames<-c("affy_moex_1_0_st_v1", "ensembl_gene_id",
> "canonical_transcript_stable_id", "rank", "ensembl_exon_id")
> # Submit biomart query
> results3<-getBM(attnames, filters=filtnames, values=affyids, mart=mart)
>
> # Fails ....
> #
> #
> V1
> #1 Query ERROR: caught BioMart::Exception::Usage: Attributes from
> multiple attribute pages are not allowed
> #Error in getBM(attnames, filters = filtnames, values = affyids, mart =
> mart) :
> #  Number of columns in the query result doesn't equal number of
> attributes in query.  This is probably an internal error, please report.
>
>
>
> # Disconnect from biomart
> martDisconnect(mart)
>
> ########################################################################
> ##############################################
>
>
>
>
>
> ########################################################################
> #########################################
>
>> sessionInfo()
> R version 2.8.1 (2008-12-22)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United
> Kingdom.1252;LC_MONETARY=English_United
> Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
>
> attached base packages:
> [1] stats     graphics  grDevices datasets  utils     methods   base
>
>
> other attached packages:
> [1] biomaRt_1.16.0 RWinEdt_1.8-0
>
> loaded via a namespace (and not attached):
> [1] RCurl_0.94-0 XML_1.99-0
>>
>
>
> ---
>
>



More information about the Bioconductor mailing list