[BioC] [BioMart Users] biomaRt returning multiple columns out of order

Laurent Gatto laurent.gatto at gmail.com
Wed Oct 19 15:52:07 CEST 2011


Dear all,

Any update about the column order in biomaRt results?
I have come across the same issue, as illustrated below.

> library(biomaRt)
> mart = useMart("plants_mart_10","athaliana_eg_gene")
> ans <- getBM(attributes=c("tair_locus","peptide"), filter="tair_locus", value=c("AT3G18780","AT2G26300"), mart=mart, verbose=TRUE)
<?xml version='1.0' encoding='UTF-8'?><!DOCTYPE Query><Query
virtualSchemaName = 'default' uniqueRows = '1' count = '0'
datasetConfigVersion = '0.6' requestid= "biomaRt"> <Dataset name =
'athaliana_eg_gene'><Attribute name = 'tair_locus'/><Attribute name =
'peptide'/><Filter name = 'tair_locus' value = 'AT3G18780,AT2G26300'
/></Dataset></Query>
> ans                                                                                                                                                                                                                                                                                                                                                                                        tair_locus
1 MAEADDI[...]ASLIDQILFRILLHAN*
2 MGLLCSR[...]VKKRRRNLLEAGLL*
3 MAEADDI[...]ILASAGPGIVHRKCF*
    peptide
1 AT3G18780
2 AT2G26300
3 AT3G18780

I see the same for useMart("ensembl","ensembl_gene_id") using
ensembl_gene_id or ensembl_exon_id as filters.
In these cases, datasetConfigVersion is also 0.6, if that's of any help.

> sessionInfo()
R Under development (unstable) (2011-10-13 r57241)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.utf8       LC_NUMERIC=C
 [3] LC_TIME=en_GB.utf8        LC_COLLATE=en_GB.utf8
 [5] LC_MONETARY=en_GB.utf8    LC_MESSAGES=en_GB.utf8
 [7] LC_PAPER=C                LC_NAME=C
 [9] LC_ADDRESS=C              LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] biomaRt_2.9.3

loaded via a namespace (and not attached):
[1] RCurl_1.6-10 XML_3.4-3

Best wishes,

Laurent

On 30 September 2011 23:11, Richard Hayes <rdhayes at lbl.gov> wrote:
> On Fri, Sep 30, 2011 at 2:51 PM, Steffen Durinck <sdurinck at gmail.com> wrote:
>
>> Hi RIchard, Arek,
>>
>> If you set verbose=TRUE in your getBM query you'll see the XML query that
>> is send to the BioMart server (see below for your example).
>> The order of the attributes in the XML query is usually the same order we
>> get the results back from the BioMart server.
>> However for your example this is not the case and there is no way for
>> biomaRt to know this (Arek correct me if this is not the case), so when we
>> add column names to the returned matrix they will be wrong when the query
>> order is not preserved in the returned result.
>>
>> > multiTest = getBM(attributes= c("organism_name", "transcript_name",
>> "exon_chrom_start", "exon_chrom_end"), filters="orgid", values="167",
>> mart=phyto,verbose=TRUE)
>>
>> <?xml version='1.0' encoding='UTF-8'?><!DOCTYPE Query><Query
>>  virtualSchemaName = 'default' uniqueRows = '1' count = '0'
>> datasetConfigVersion = '0.6' requestid= "biomaRt"> <Dataset name =
>> 'phytozome'><Attribute name = 'organism_name'/><Attribute name =
>> 'transcript_name'/><Attribute name = 'exon_chrom_start'/><Attribute name =
>> 'exon_chrom_end'/><Filter name = 'orgid' value = '167' /></Dataset></Query>
>>
>>
> Okay, I see that on my end as well. Is this a consequence of biomart v0.6 on
> the backend that would be alleviated by our plans to upgrade to 0.7 soon?
>
>
>>
>> Cheers,
>> Steffen
>>
>>
>> On Thu, Sep 29, 2011 at 9:08 AM, Arek Kasprzyk <arek.kasprzyk at gmail.com>wrote:
>>
>>> Hi Richard,
>>> the best person to help you is Steffen Durinck, the original biomaRt coder
>>> (cc'ed on this email)
>>>
>>> a
>>>
>>> On Wed, Sep 28, 2011 at 3:52 PM, Richard Hayes <rdhayes at lbl.gov> wrote:
>>>
>>>> Hi,
>>>>
>>>> Our group maintains the biomart instance at the Phytozome plant genomics
>>>> portal. We've had some users report problems with the result sets from the
>>>> biomaRt interface. It is unclear if this is a biomaRt problem or a problem
>>>> in our mart configuration. At the moment, we are still running biomart
>>>> version 0.6, but are hoping to upgrade in the very near future to 0.7.
>>>>
>>>> I had been testing with R 2.12.2 and biomaRt 2.6.0, but then upgraded to
>>>> R 2.13.1 and biomaRt 2.8.1. The problems persist with these latest software
>>>> releases.
>>>>
>>>> I can successfully connect to our mart and the main genome transcript
>>>> dataset as follows, successfully retrieving a single column of transcript
>>>> names for Arabidopsis thaliana using our internal "orgid" filter for
>>>> organism ID 167:
>>>>
>>>> > library('biomaRt')
>>>> > phyto=useMart('phytozome_mart', dataset='phytozome')
>>>> > transcripts = getBM(attributes = c("transcript_name"), filters=
>>>> "orgid", values="167", mart=phyto)
>>>> > transcripts[1:5,]
>>>> [1] "AT2G38230.1" "AT2G39920.2" "AT2G26530.1" "AT2G28630.1" "AT2G19280.1"
>>>>
>>>> However, when I construct a multicolumn query, the columns are not
>>>> returned in the expected order:
>>>>
>>>> > multiTest = getBM(attributes= c("organism_name", "transcript_name",
>>>> "exon_chrom_start", "exon_chrom_end"), filters="orgid", values="167",
>>>> mart=phyto)
>>>> > multiTest[1:5,]
>>>>   organism_name transcript_name exon_chrom_start exon_chrom_end
>>>> 1   AT5G47220.1        19171862         19172823      Athaliana
>>>> 2   AT1G71920.3        27067059         27067098      Athaliana
>>>> 3   AT1G71920.3        27067189         27067401      Athaliana
>>>> 4   AT1G71920.3        27067506         27067589      Athaliana
>>>> 5   AT1G71920.3        27067706         27067860      Athaliana
>>>>
>>>> Any help diagnosing the source of this problem is much appreciated.
>>>>
>>>> Best regards,
>>>>
>>>> --
>>>> Richard D. Hayes, Ph.D.
>>>> Joint Genome Institute / Lawrence Berkeley National Lab
>>>> http://www.phytozome.net
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at biomart.org
>>>> https://lists.biomart.org/mailman/listinfo/users
>>>>
>>>>
>>>
>>
>
>
> --
> Richard D. Hayes, Ph.D.
> Joint Genome Institute / Lawrence Berkeley National Lab
> http://www.phytozome.net
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
[ Laurent Gatto | slashhome.be ]



More information about the Bioconductor mailing list