[BioC] biomaRt question -- not getting a gene

steffen at stat.Berkeley.EDU steffen at stat.Berkeley.EDU
Thu Aug 21 19:33:50 CEST 2008


Hi Elizabeth,

It would be great if you could report this to helpdesk at ensembl.org.

Ideally when you see inconsistencies like this, you do the biomaRt queries
again and set verbose=TRUE in the getBM function.

This will print out the exact XML query that is send to the Ensembl
BioMart.  Add this XML message to your email to the helpdesk, and they can
then use it to figure out what is going on.

biomaRt only provides an interface to the Ensembl BioMart system and
doesn't change anything in the query results.  So whatever Ensembl gives
back, is returned by getBM.

Cheers,
Steffen

> Hello,
> I am baffled by something I happened to discover in the results of my
> query with biomaRt and I can't figure out what's going on. I am using
> getBM to pull down a large number of gene coordinates, and filtering to
> restrict to chromosomes 1-22 and X,Y. For some reason this procedure
> (which is giving no errors) is not pulling down some genes that I think
> it should.
>
> My basic code for pulling down all of this information is:
> tempAll<-getBM(c("ensembl_gene_id", "start_position",
> "end_position","strand","chromosome_name","biotype"),filter =
> "chromosome_name", values = c(1:22, "X", "Y"),mart = mart)
>
> A particular gene, "ENSG00000011677", is found by 'getGene' (and other
> getBM queries with different filters, as I discuss below) but not in my
> main query:
>  > getGene("ENSG00000011677","ensembl_gene_id",mart)
>    ensembl_gene_id hgnc_symbol
> 1 ENSG00000011677      GABRA3
>
>                                                      description
> 1 Gamma-aminobutyric acid receptor subunit alpha-3 precursor (GABA(A)
> receptor subunit alpha-3). [Source:Uniprot/SWISSPROT;Acc:P34903]
>    chromosome_name band strand start_position end_position ensembl_gene_id
> 1               X  q28     -1      151086290    151370993 ENSG00000011677
>  > tempAll[match("ENSG00000011677",tempAll$ensembl_gene_id),]
>     ensembl_gene_id start_position end_position strand chromosome_name
> biotype
> NA            <NA>             NA           NA     NA            <NA>
>   <NA>
>
> Oddly, if I change my main code to filter on chromosome_name but just
> "X", just c("X","Y"), just c(1,"X"), and a couple of other combinations
> I picked then this gene correctly appears. It also appears if I filter
> on 'biotype' equals 'protein_coding'. I won't show all of these results
> unless someone wants, but I just copied and pasted so that was
> definitely the only thing changing.
>
> When I looked, of the 21,021 genes on chr1-22,X,Y brought down with
> filter of 'biotype' equals 'protein_coding', only 16,236 of them were in
> my main query that limited by chromosome ('tempAll' above). The ~5,000
> missing ones are only in chr 5-9 and X,Y. I'm thinking there is some
> matching problem going on but I don't know where (and if it's my error
> or not).
>
> For now I'm just pulling it all down and filtering myself, but I would
> like to know what's going on here.
>
> Best,
> Elizabeth
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list