[BioC] BiomaRt return value

Steffen at stat.Berkeley.EDU Steffen at stat.Berkeley.EDU
Mon Nov 23 19:27:52 CET 2009


Hi Tony,

I want to add that in the past we used to return what was used as input to
the query (filter) also as an attribute.  However this is not
generalizable as for some attributes/filters the name is different e.g.
"start_position" in attribute list and "start" in filter list.  And
sometimes a filter is not present as an attribute. To make our code more
stable we took this out, and if a user wants such functionality then I
agree with Wolfgang and it should be a wrapper around getBM that does
this.

Cheers,
Steffen


> Hi Tony
>
> thanks for these good ideas. Both of these you could implement in a
> small wrapper function around getBM. Once you find that this is a
> stable, generally useful function, we'd be happy to accept your patch
> for the biomaRt package!
>
> Btw, ENSP00000045065 is a valid protein sequence ID with many hits for
> it in Google, and indeed in the search box at http://www.ebi.ac.uk. The
> fact that the hsapiens_gene_ensembl mart does not know a mapping of it
> to an extant gene name could have all sorts of reasons, historical or
> scientific, which you could explore at the EBI website.
>
> 	Best wishes
> 	Wolfgang
>
>
>   Chiang wrote:
>> Hi Steffen, Sean, Wolfgang,
>>
>> I have a question about the return value of the getBM() function. It is
>> a
>> data frame object, and in the examples that I have seen, usually if I
>> want
>> to map from EMBL IDs to Entrez Gene IDs, we would still also want to map
>> the
>> EMBL IDs back to the EMBL IDs so we know what has mapped to what.
>> Example
>> code to follow if my explanation is not clear:
>>
>> ################
>> library(biomaRt)
>> ensembl = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
>> filters = listFilters(ensembl)
>> attributes = listAttributes(ensembl)
>> ##Here are my IDs from String
>> test = c("9606.ENSP00000045065", "9606.ENSP00000158762",
>> "9606.ENSP00000174653",
>> "9606.ENSP00000202967", "9606.ENSP00000204517", "9606.ENSP00000212015",
>> "9606.ENSP00000220616", "9606.ENSP00000222008", "9606.ENSP00000222390",
>> "9606.ENSP00000223051")
>> emblID = sapply(strsplit(test, "\\."), function(x) x[2])
>> ##And the code I am using for the mapping is:
>> getBM(attributes=c("ensembl_peptide_id", "entrezgene","ensembl_gene_id",
>> "hgnc_automatic_gene_name"), filters="ensembl_peptide_id",
>> values=emblID,
>> mart=ensembl)
>> ##################
>>
>> So I guess I have two questions: would it be a good idea to always
>> return
>> what we input in the output data frame so we would have not to have the
>> redundant attribute ("ensembl_peptide_id" in my example). Also, if you
>> ran
>> the code, you will see that ENSP00000045065 did not map at all , so I
>> assume
>> that it is not a valid ensembl_peptide_id (this is a bit strange since I
>> am
>> using EMBL IDs); I also want to ask if there is some way to make that
>> more
>> transparent...maybe a row of NA values? I realize that these are not
>> terrible things to work around, but would it not make sense to have
>> this? If
>> not, please let me know.
>>
>> Cheers,
>> --Tony
>>
>>> sessionInfo()
>> R version 2.10.0 Patched (2009-10-27 r50222)
>> x86_64-apple-darwin9.8.0
>>
>> locale:
>> [1] en_US.utf-8/en_US.utf-8/C/C/en_US.utf-8/en_US.utf-8
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] biomaRt_2.2.0
>>
>> loaded via a namespace (and not attached):
>> [1] RCurl_1.2-1 XML_2.6-0
>>
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> --
>
> Best wishes
>       Wolfgang
>
>
> --
> Wolfgang Huber
> EMBL
> http://www.embl.de/research/units/genome_biology/huber/contact
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list