[BioC] possible bug in getBM{biomaRt}

Wolfgang Huber huber at ebi.ac.uk
Fri Apr 3 21:24:36 CEST 2009


Dear Rhoda, Teresa

if I try that suggestion, by:


library("biomaRt")
mart = useMart("ensembl")
ensembl = useDataset("hsapiens_gene_ensembl", mart = mart);

library("hgu133plus2.db")
EID = toTable(hgu133plus2ENTREZID)
set.seed(0xbadbeef)
I = sample(nrow(EID), 100)
p = EID[I,"probe_id"]

Q1=getBM(attributes=c("affy_hg_u133_plus_2","entrezgene"),filters="affy_hg_u133_plus_2",mart=ensembl,values=p)
Q2=getBM(attributes=c("affy_hg_u133_plus_2","validated"),filters="affy_hg_u133_plus_2",mart=ensembl,values=p)


I get:

Error in getBM(attributes = c("affy_hg_u133_plus_2", "validated"), 
filters = "affy_hg_u133_plus_2",  :
   Query ERROR: caught BioMart::Exception::Usage: Attributes from 
multiple attribute pages are not allowed


(Note that this is the same error that Teresa encountered, just that in 
more recent versions of biomaRt the error message is more informative.)

OTOH, if I only do

Q3=getBM(attributes=c("validated"),filters="affy_hg_u133_plus_2",mart=ensembl,values=p)

this returns successfully, but then Q3 has 27 rows and there seems no 
way to find out to which of the filter values the rows belong to - i.e., 
it is useless.

So what's the best way to proceed when one wants to query across 
multiple pages like in Teresa's example?

  Best wishes
      Wolfgang

----------------------------------------------------
Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber


 > sessionInfo()
R version 2.10.0 Under development (unstable) (2009-04-02 r48271)
x86_64-unknown-linux-gnu

locale:
LC_CTYPE=C;LC_NUMERIC=C;LC_TIME=C;LC_COLLATE=C;LC_MONETARY=C;LC_MESSAGES=la_AU.UTF-8;LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C;LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base

other attached packages:
[1] hgu133plus2.db_2.2.11 RSQLite_0.7-1         DBI_0.2-4
[4] AnnotationDbi_1.5.23  Biobase_2.3.11        biomaRt_1.99.8
[7] fortunes_1.3-6

loaded via a namespace (and not attached):
[1] RCurl_0.94-1 XML_2.3-0



Rhoda Kinsella wrote:
> Hi Teresa,
> The Ensembl mart attributes are set up as 4 separate categories or 
> "pages". If you take a look at the martview interface you will see that 
> these categories are: Features, Homologs, Structures, Sequences and 
> Variations. At present, it is not possible to mix attributes from 
> multiple sections as you will get the error message you received (i.e. 
> in your query the "validated" attribute came from the "Variations" 
> section and the rest of the attributes came from the "Features" 
> section).  The way around this is to perform two separate queries; one 
> to select the features attributes and one to retrieve the variations 
> attribute. You may also be able to link to two separate datasets, one 
> for the validated part of the query and the other for the features part 
> of the query and pull out all the information you need. I'm not sure how 
> this is done using biomaRt, but perhaps someone else from the mailing 
> list can help you to do this.
> I hope that helps,
> Regards,
> Rhoda
> 
> 
> On 3 Apr 2009, at 15:09, Teresa Colombo wrote:
> 
>> Dear list,
>>
>> it looks like there may be a bug in function 'getBM' affecting the use of
>> attribute 'validated':
>>
>> -------------------------------------------------------------------------------------------------------------------- 
>>
>>> library(biomaRt)
>>> mart = useMart("ensembl")
>>> ensembl = useDataset("hsapiens_gene_ensembl", mart = mart);
>>
>>> library(hgu133plus2.db);
>>> EID <- toTable(hgu133plus2ENTREZID);
>>> I <- sample(dim(EID)[1],100)
>>> p <- EID[I,"probe_id"];
>>> head(p)
>> [1] "205953_at"   "214718_at"   "233633_at"   "241572_at"   "224704_at"
>> [6] "221510_s_at"
>>
>>
>>> Q <-
>> getBM(attributes=c("affy_hg_u133_plus_2","entrezgene","validated"),filters="affy_hg_u133_plus_2",mart=ensembl,values=p); 
>>
>>
>>                                                                          
>> V1
>> 1 Query ERROR: caught BioMart::Exception::Usage: Attributes from multiple
>> attribute pages are not allowed
>> Errore in getBM(attributes = c("affy_hg_u133_plus_2", "entrezgene",
>> "validated"),  :
>>  Number of columns in the query result doesn't equal number of attributes
>> in query.  This is probably an internal error, please report.
>>
>>
>> -------------------------------------------------------------------------------------------------------------------- 
>>
>> The above error message disappears when running the same query after 
>> having
>> removed "validated" from the list of attributes.
>> -------------------------------------------------------------------------------------------------------------------- 
>>
>>> Q <-
>> getBM(attributes=c("affy_hg_u133_plus_2","entrezgene"),filters="affy_hg_u133_plus_2",mart=ensembl,values=p); 
>>
>>> head(Q)
>>  affy_hg_u133_plus_2 entrezgene
>> 1        1552538_a_at         NA
>> 2        1552538_a_at     221458
>> 3        1554485_s_at     140738
>> 4        1555097_a_at       5737
>> 5          1564015_at         NA
>> 6        1564198_a_at     118611
>>
>>
>>> sessionInfo()
>> R version 2.8.1 (2008-12-22)
>> i486-pc-linux-gnu
>>
>> locale:
>> LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=C;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C 
>>
>>
>> attached base packages:
>> [1] tools     stats     graphics  grDevices utils     datasets  methods
>> [8] base
>>
>> other attached packages:
>> [1] hgu133plus2.db_2.2.5 RSQLite_0.7-1        DBI_0.2-4
>>  [4] AnnotationDbi_1.4.3  Biobase_2.2.2        biomaRt_1.16.0
>>
>> loaded via a namespace (and not attached):
>> [1] RCurl_0.94-1 XML_2.1-0
>> -------------------------------------------------------------------------------------------------------------------- 
>>
>>
>>
>> Best wishes
>> teresa
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> Rhoda Kinsella Ph.D.
> Ensembl Bioinformatician,
> European Bioinformatics Institute (EMBL-EBI),
> Wellcome Trust Genome Campus,
> Hinxton
> Cambridge CB10 1SD,
> UK.



More information about the Bioconductor mailing list