[BioC] issue of genome build versions when using biomaRt

Joern Toedling toedling at ebi.ac.uk
Fri Nov 7 13:04:12 CET 2008


if you use the Ensembl biomart, you get access to the genome builds as
they are/were current in that release of Ensembl. For example, in the
current Ensembl release (50), it is NCBI36 for H.sapiens. You can see
which versions of genome builds are associated with that release by:

ensembl <- useMart("ensembl")

In some cases, you can get older versions of genome builds, by using
marts of archived, previous Ensembl releases. Which archive marts are
available, you can see by

For example,
ensembl43 <- useMart("ensembl_mart_43", archive=TRUE)
shows you the genome builds in Ensembl release 43.

However, there does not seem to be a very old archive mart that would
allow you to access NCBI35 for H.sapiens. Someone please correct me if
they know better.
So I am afraid that you will have to resort to other sources for the UTR
sequences in NCBI35.

Best regards,

Al Tango wrote:
> Hi all,  Although seems a frequently asked question, I didn't find it
> in archives.
> When specify chromosomal coordinates for a region in using biomaRt or
> other BioC packages, how can I know the version of genome assembly
> being retrieved, and is it possible to define a particular version to
> use?
> eg, I am searching for 5'UTR sequence of gene(s) within a region this way:
> ensembl = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
> getSequence(chromosome=3, start=185514033, end=185535839,
> type="entrezgene", seqType="5utr", mart=ensembl)
> My questions: does it treat the start/end coordinates as in the latest
> version of builld 36 (2006)? can I opt for build 35 or hg17 (2004)?
> Thanks for your help in advance.

Joern Toedling
EMBL - European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton, Cambridge CB10 1SD
United Kingdom
Phone  +44(0)1223 492566
Email  toedling at ebi.ac.uk

More information about the Bioconductor mailing list