[BioC] possible bug in getBM{biomaRt}

Wolfgang Huber huber at ebi.ac.uk
Mon Apr 6 18:36:47 CEST 2009


Dear Rhoda & Teresa

thanks! That clarifies - the two query results are linked by 
"ensembl_gene_id". There is no need to open two different connections in 
biomaRt, so example code including merging and some clean up could look 
like:


library("biomaRt")
mart = useMart("ensembl")
ensembl = useDataset("hsapiens_gene_ensembl", mart = mart)

p = c("200641_s_at" , "229411_at" , "223376_s_at")

Q1 = getBM(attributes =
              c("affy_hg_u133_plus_2","entrezgene","ensembl_gene_id"),
            filters = "affy_hg_u133_plus_2", mart=ensembl, values=p)

Q2 = getBM(attributes=c("validated","ensembl_gene_id"),
            filters="affy_hg_u133_plus_2", mart=ensembl, values=p)

## clean up:
## Q1 = subset(Q1, !is.na(entrezgene))

Q = merge(Q1,Q2)



Best wishes
      Wolfgang

----------------------------------------------------
Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber

Rhoda Kinsella wrote:
> Hi Wolfgang and Teresa,
> In the query below, Q1 and Q2 are still accessing the same mart 
> (mart=ensembl) so this will try to get the attributes from multiple 
> pages of the same mart. You will need to set up two distinct mart 
> datasets (mart1 and mart2) and as Ensembl is gene centric, you will also 
> have to include something for the two datasets to link on (e.g. Ensembl 
> gene ID or Ensembl transcript ID as these are present on the two pages 
> that you are trying to access). I think you need to do something like this:
> 
>  > library(biomaRt)
>  > mart1 = useMart("ensembl")
>  > mart2 = useMart("ensembl")
>  > ensembl1 = useDataset("hsapiens_gene_ensembl",mart=mart1);
> Checking attributes and filters ... ok
>  > ensembl2 = useDataset("hsapiens_gene_ensembl",mart=mart2);
> Checking attributes and filters ... ok
>  > getBM(attributes=c("affy_hg_u133_plus_2","entrezgene", 
> "ensembl_gene_id"), 
> filters="affy_hg_u133_plus_2",mart=ensembl1,values="205953_at");
> 
>   affy_hg_u133_plus_2 entrezgene ensembl_gene_id
> 1           205953_at       9860 ENSG00000198799
> 
>  > getBM(attributes=c("validated", 
> "ensembl_gene_id"),filters="affy_hg_u133_plus_2",mart=ensembl2, 
> values="205953_at");
>                                  validated ensembl_gene_id
> 1                                          ENSG00000198799
> 2                                   hapmap ENSG00000198799
> 3  cluster,freq,submitter,doublehit,hapmap ENSG00000198799
> 4                              freq,hapmap ENSG00000198799
> 5            cluster,freq,submitter,hapmap ENSG00000198799
> 6                    freq,doublehit,hapmap ENSG00000198799
> 7                        cluster,doublehit ENSG00000198799
> 8                                  cluster ENSG00000198799
> 9                      cluster,freq,hapmap ENSG00000198799
> 10           cluster,freq,doublehit,hapmap ENSG00000198799
> 11                  cluster,freq,doublehit ENSG00000198799
> 12                                    freq ENSG00000198799
> 13                               doublehit ENSG00000198799
> 14                          cluster,hapmap ENSG00000198799
> 15                   freq,submitter,hapmap ENSG00000198799
> 
> I am sure that there is a more elegant way of doing the linking using 
> biomaRt. I hope this helps but if anything is not clear, please don't 
> hesitate to get in touch.
> Regards,
> Rhoda
> 
> 
>> Dear Rhoda, Teresa
>>
>> if I try that suggestion, by:
>>
>>
>> library("biomaRt")
>> mart = useMart("ensembl")
>> ensembl = useDataset("hsapiens_gene_ensembl", mart = mart);
>>
>> library("hgu133plus2.db")
>> EID = toTable(hgu133plus2ENTREZID)
>> set.seed(0xbadbeef)
>> I = sample(nrow(EID), 100)
>> p = EID[I,"probe_id"]
>>
>> Q1=getBM(attributes=c("affy_hg_u133_plus_2","entrezgene"),filters="affy_hg_u133_plus_2",mart=ensembl,values=p)
>> Q2=getBM(attributes=c("affy_hg_u133_plus_2","validated"),filters="affy_hg_u133_plus_2",mart=ensembl,values=p)
>>
>>
>> I get:
>>
>> Error in getBM(attributes = c("affy_hg_u133_plus_2", "validated"), 
>> filters = "affy_hg_u133_plus_2",  :
>>  Query ERROR: caught BioMart::Exception::Usage: Attributes from 
>> multiple attribute pages are not allowed
>>
>>
>> (Note that this is the same error that Teresa encountered, just that 
>> in more recent versions of biomaRt the error message is more informative.)
>>
>> OTOH, if I only do
>>
>> Q3=getBM(attributes=c("validated"),filters="affy_hg_u133_plus_2",mart=ensembl,values=p)
>>
>> this returns successfully, but then Q3 has 27 rows and there seems no 
>> way to find out to which of the filter values the rows belong to - 
>> i.e., it is useless.
>>
>> So what's the best way to proceed when one wants to query across 
>> multiple pages like in Teresa's example?
>>
>> Best wishes
>>     Wolfgang
>>
>> ----------------------------------------------------
>> Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber
>>
>>
>> > sessionInfo()
>> R version 2.10.0 Under development (unstable) (2009-04-02 r48271)
>> x86_64-unknown-linux-gnu
>>
>> locale:
>> LC_CTYPE=C;LC_NUMERIC=C;LC_TIME=C;LC_COLLATE=C;LC_MONETARY=C;LC_MESSAGES=la_AU.UTF-8;LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C;LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices datasets  utils     methods   base
>>
>> other attached packages:
>> [1] hgu133plus2.db_2.2.11 RSQLite_0.7-1         DBI_0.2-4
>> [4] AnnotationDbi_1.5.23  Biobase_2.3.11        biomaRt_1.99.8
>> [7] fortunes_1.3-6
>>
>> loaded via a namespace (and not attached):
>> [1] RCurl_0.94-1 XML_2.3-0
>>
>>
>>
>> Rhoda Kinsella wrote:
>>> Hi Teresa,
>>> The Ensembl mart attributes are set up as 4 separate categories or 
>>> "pages". If you take a look at the martview interface you will see 
>>> that these categories are: Features, Homologs, Structures, Sequences 
>>> and Variations. At present, it is not possible to mix attributes from 
>>> multiple sections as you will get the error message you received 
>>> (i.e. in your query the "validated" attribute came from the 
>>> "Variations" section and the rest of the attributes came from the 
>>> "Features" section).  The way around this is to perform two separate 
>>> queries; one to select the features attributes and one to retrieve 
>>> the variations attribute. You may also be able to link to two 
>>> separate datasets, one for the validated part of the query and the 
>>> other for the features part of the query and pull out all the 
>>> information you need. I'm not sure how this is done using biomaRt, 
>>> but perhaps someone else from the mailing list can help you to do this.
>>> I hope that helps,
>>> Regards,
>>> Rhoda
>>> On 3 Apr 2009, at 15:09, Teresa Colombo wrote:
>>>> Dear list,
>>>>
>>>> it looks like there may be a bug in function 'getBM' affecting the 
>>>> use of
>>>> attribute 'validated':
>>>>
>>>> -------------------------------------------------------------------------------------------------------------------- 
>>>>
>>>>> library(biomaRt)
>>>>> mart = useMart("ensembl")
>>>>> ensembl = useDataset("hsapiens_gene_ensembl", mart = mart);
>>>>
>>>>> library(hgu133plus2.db);
>>>>> EID <- toTable(hgu133plus2ENTREZID);
>>>>> I <- sample(dim(EID)[1],100)
>>>>> p <- EID[I,"probe_id"];
>>>>> head(p)
>>>> [1] "205953_at"   "214718_at"   "233633_at"   "241572_at"   "224704_at"
>>>> [6] "221510_s_at"
>>>>
>>>>
>>>>> Q <-
>>>> getBM(attributes=c("affy_hg_u133_plus_2","entrezgene","validated"),filters="affy_hg_u133_plus_2",mart=ensembl,values=p); 
>>>>
>>>>
>>>>                                                                         V1
>>>> 1 Query ERROR: caught BioMart::Exception::Usage: Attributes from 
>>>> multiple
>>>> attribute pages are not allowed
>>>> Errore in getBM(attributes = c("affy_hg_u133_plus_2", "entrezgene",
>>>> "validated"),  :
>>>> Number of columns in the query result doesn't equal number of attributes
>>>> in query.  This is probably an internal error, please report.
>>>>
>>>>
>>>> -------------------------------------------------------------------------------------------------------------------- 
>>>>
>>>> The above error message disappears when running the same query after 
>>>> having
>>>> removed "validated" from the list of attributes.
>>>> -------------------------------------------------------------------------------------------------------------------- 
>>>>
>>>>> Q <-
>>>> getBM(attributes=c("affy_hg_u133_plus_2","entrezgene"),filters="affy_hg_u133_plus_2",mart=ensembl,values=p); 
>>>>
>>>>> head(Q)
>>>> affy_hg_u133_plus_2 entrezgene
>>>> 1        1552538_a_at         NA
>>>> 2        1552538_a_at     221458
>>>> 3        1554485_s_at     140738
>>>> 4        1555097_a_at       5737
>>>> 5          1564015_at         NA
>>>> 6        1564198_a_at     118611
>>>>
>>>>
>>>>> sessionInfo()
>>>> R version 2.8.1 (2008-12-22)
>>>> i486-pc-linux-gnu
>>>>
>>>> locale:
>>>> LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=C;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C 
>>>>
>>>>
>>>> attached base packages:
>>>> [1] tools     stats     graphics  grDevices utils     datasets  methods
>>>> [8] base
>>>>
>>>> other attached packages:
>>>> [1] hgu133plus2.db_2.2.5 RSQLite_0.7-1        DBI_0.2-4
>>>> [4] AnnotationDbi_1.4.3  Biobase_2.2.2        biomaRt_1.16.0
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] RCurl_0.94-1 XML_2.1-0
>>>> -------------------------------------------------------------------------------------------------------------------- 
>>>>
>>>>
>>>>
>>>> Best wishes
>>>> teresa
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch <mailto:Bioconductor at stat.math.ethz.ch>
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives: 
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> Rhoda Kinsella Ph.D.
>>> Ensembl Bioinformatician,
>>> European Bioinformatics Institute (EMBL-EBI),
>>> Wellcome Trust Genome Campus,
>>> Hinxton
>>> Cambridge CB10 1SD,
>>> UK.
> 
> Rhoda Kinsella Ph.D.
> Ensembl Bioinformatician,
> European Bioinformatics Institute (EMBL-EBI),
> Wellcome Trust Genome Campus, 
> Hinxton
> Cambridge CB10 1SD,
> UK.
>



More information about the Bioconductor mailing list