[BioC] Search queries with biomaRt does not align with online queries via ensembl

Hotz, Hans-Rudolf hrh at fmi.ch
Mon Mar 1 09:31:24 CET 2010




On 2/28/10 7:16 PM, "Tony Chiang" <tchiang at fhcrc.org> wrote:

> Hi Steffen et al,
> 
> Quick question about a search query via biomaRt. Here is the code that I am
> using:
> 
> *****
> library(biomaRt)
> ensembl = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
> filters = listFilters(ensembl)
> attributes = listAttributes(ensembl)
> getBM(attributes=c("ensembl_peptide_id", "entrezgene",
>                "ensembl_gene_id", "hgnc_automatic_gene_name"),
>                filters="hgnc_automatic_gene_name", values="ATF4",
>                mart=ensembl)
> *****

try ' filters="hgnc_symbol" ', eg:


> getBM(attributes=c("ensembl_peptide_id", "entrezgene","ensembl_gene_id",
"hgnc_automatic_gene_name"), filters="hgnc_symbol", values="ATF4", mart=ensembl)
  ensembl_peptide_id entrezgene ensembl_gene_id hgnc_automatic_gene_name
1    ENSP00000384587        468 ENSG00000128272                       NA
2    ENSP00000336790        468 ENSG00000128272                       NA
3    ENSP00000379912        468 ENSG00000128272                       NA
> 



Hans

> For me, this returns an empty data frame. But when I query ATF4 online at
> ensembl, I find what I need. I also looked up ATF4 at genenames.org (HUGO)
> and it seems that ATF4 is a valid hgnc gene name, so the filter so be fine.
> I guess the only other reason that I can see is which dataset I use in the
> useMart function. I am guessing that the online API will search through all
> datasets while I am only specifying a single one? If this is true, do you
> know of a sensible work around? I have about 150 genes that I would like
> mapped to the EBML ID names but using the code above with a vector of gene
> names, I can only map around 25...but if I manually query for some of the
> non-mapped gene names, I get what I am after. If I am wrong about my guess
> in the dataset, can you let me know what you think might be going on?
> 
> Tony
> 
>> sessionInfo()
> R version 2.11.0 Under development (unstable) (2010-01-16 r50993)
> i386-apple-darwin10.2.0
> 
> locale:
> [1] en_US.utf-8/en_US.utf-8/C/C/en_US.utf-8/en_US.utf-8
> 
> attached base packages:
> [1] grid      stats     graphics  grDevices utils     datasets  methods
> [8] base
> 
> other attached packages:
>  [1] hgu133plus2.db_2.3.5 org.Hs.eg.db_2.3.6   Rgraphviz_1.25.1
>  [4] biomaRt_2.3.0        GOstats_2.13.0       RSQLite_0.8-1
>  [7] DBI_0.2-5            Category_2.13.0      AnnotationDbi_1.9.4
> [10] Biobase_2.7.3        RBGL_1.23.0          graph_1.25.5
> 
> loaded via a namespace (and not attached):
>  [1] annotate_1.25.1   genefilter_1.29.5 GO.db_2.3.5       GSEABase_1.9.0
>  [5] RCurl_1.3-1       splines_2.11.0    survival_2.35-8   tools_2.11.0
>  [9] XML_2.6-0         xtable_1.5-6
> 
> [[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list