[BioC] Biomart query in Web interface Vs. biomaRt package?

J.J.P.Lebrec at lumc.nl J.J.P.Lebrec at lumc.nl
Tue Oct 9 09:59:54 CEST 2007


Hi Steffen,

There does not seem to be a 'biol_process' filter in dataset 'hsapiens_gene_ensembl' (see below). 

> human = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
Checking attributes and filters ... ok
> getBM(attributes = "external_gene_id", filters = "biol_process", values = "GO:0006996", mart = human)
Erreur dans getBM(attributes = "external_gene_id", filters = "biol_process",  : 
        Invalid filters(s): biol_process 
Please use the function 'listFilters' to get valid filter names
>

So I have tried to generate the same gene list as in the web query (which yields exactly 1140 unique genes) using the following code to get all biological processes offspring of GO:0006996 :

> library(GO)
> dim( unique( getBM(attributes = c("external_gene_id"), filters = "go", values = c("GO:0006996",GOBPOFFSPRING$"GO:0006996"), mart = human) ) )
[1] 1143    1

The two gene lists have 1137 genes in common and I cannot explain this remaining discrepancy.

Thanks again for enquiring about this,

Jérémie

-----Original Message-----
From: Steffen [mailto:sdurinck at lbl.gov] 
Sent: lundi 8 octobre 2007 18:14
To: Lebrec, J.J.P. (MSTAT)
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] Biomart query in Web interface Vs. biomaRt package?

Hi Jeremie,

Below the answer from the Ensembl helpdesk, in short the 'go' filter 
will retrieve all genes associated with a particular GO identifier and 
the 'biol_process' filter will retrieve all genes associated with a 
particular GO identifier and all of it's children thus explaining why 
one gets more genes when using 'biol_process' compared to 'go' as 
filter. (the Ensembl BioMart Web interface uses 'biol_process' and you 
used 'go' in your biomaRt query)

Cheers,
Steffen

-----

When you query BioMart filtering a specific GO term (GO:0006996, or a
list) you can retrieve all those entries associated to that/those GO
term(s)... But if you filter using a 'Biological process' and then add
an ID, in this case you get all the entries matching that ID and all the
children...

organelle organization and biogenesis [GO:0006996]
autophagic vacuole formation [GO:0000045]
chromosome organization and biogenesis [GO:0051276]
chromosome condensation [GO:0030261]
chromosome decondensation [GO:0051312]
chromosome organization and biogenesis (sensu Bacteria) [GO:0051277]
chromosome organization and biogenesis (sensu Eukaryota) [GO:0007001]
chromosome breakage [GO:0031052]
establishment and/or maintenance of chromatin architecture [GO:0006325]
karyosome formation [GO:0030717]
....     

As seen here:
http://www.ensembl.org/Homo_sapiens/goview?depth=2;query=organelle+organization+and+biogenesis

I hope this explains,
-- Xose M Fernandez (Ensembl User Support)



J.J.P.Lebrec at lumc.nl wrote:
> Hi,
>
> Using the web based Biomart tool (
> http://www.ensembl.org/biomart/martview/ ) in database=Ensembl 46,
> dataset=Homo sapiens Genes (NCBI 36), I have manually extracted all
> unique genes' 'External Gene ID' using GO pathway GO:0006996 as a
> filter. I obtained 1141 unique genes.
>
> I tried to automate the process using the BiomaRt package with the below
> query which only yielded 9 unique genes!
>
>   
>> human = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
>>     
> Checking attributes and filters ... ok
>   
>> getBM(attributes = "external_gene_id", filters = "go", values =
>>     
> "GO:0006996", mart = human)
>    external_gene_id
> 1             KIF3A
> 2              HPS3
> 3              HPS3
> 4            DTNBP1
> 5            DTNBP1
> 6             KIF5C
> 7             KIF4A
> 8              HPS1
> 9              HPS6
> 10             HPS6
> 11             HPS6
> 12            KIF25
> 13             HPS4
>   
>> sessionInfo()
>>     
> R version 2.5.1 (2007-06-27) 
> i386-pc-mingw32 
>
> locale:
> LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETARY=Fr
> ench_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252
>
> attached base packages:
> [1] "stats"     "graphics"  "grDevices" "utils"     "datasets"
> "methods"  
> [7] "base"     
>
> other attached packages:
>  biomaRt    RCurl      XML 
> "1.10.1"  "0.8-0"  "1.9-0" 
>   
>
> I thought the two queries to be equivalent, could you please tell me
> what I am doing wrong here?
>
> Many thanks in advance,
>
> Jeremie
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>   



More information about the Bioconductor mailing list