[BioC] GenomeGraphs/biomaRt/getBM on older genome builds

Steffen at stat.Berkeley.EDU Steffen at stat.Berkeley.EDU
Mon Jun 15 05:50:36 CEST 2009


Hi Mark,

GenomeGraphs contains hard coded filter and attribute names which get used
by biomaRt to retrieve gene information.  This can result in compatibility
issues with old archived Ensembl databases.  In addition some internal
representations have changed since Ensembl 51, I think, and this causes
some extra compatibility issues.  Archived Ensembl versions >=51 should
work, but it looks like your gene is not in those either.

If there is any other way (e.g. use the Ensembl web interface on version
46) for you to retrieve the data in this area you could fill out the NA's
in the data.frame that sits in gr at ens in your example.  And then you
should be able to plot them with gdPlot.

Cheers,
Steffen



> Hi all.
>
> I would like to use GenomeGraphs (specifically, a "GeneRegion" object
> plotted with gdPlot()) ... but I have coordinates from an older genome
> build.  When I try to access the older Ensembl mart, I get an error in
> getBM().
>
> Is this even possible?  I would be delighted if it is.  Of course, it
> does give a warning (see below) that some biomaRt functions will not
> work, so perhaps this is futile.  Is there another alternative?
>
> My commands:
>
> --------
> library(GenomeGraphs)
>
> mart <- useMart(biomart="ensembl", dataset="mmusculus_gene_ensembl")
> ds <- listDatasets(mart)
> ds[grep("mus",ds$desc),]
>
> # RPLP1 on mm8 (i.e. not recent) build
> # this will run, but obviously won't find my gene
> gr <- new("GeneRegion", chromosome = "9",
>                     start = 61711290, end = 61712548, strand="-",
> biomart = mart)
>
> print(gr)
>
> # try the archived version
> ensembl46 <- useMart(biomart="ensembl_mart_46",
> dataset="mmusculus_gene_ensembl", archive=TRUE)
> ds46 <- listDatasets(ensembl46)
> ds46[grep("mus",ds46$desc),]
>
> gr46 <- new("GeneRegion", chromosome = "9",
>                     start = 61711290, end = 61712548, strand="-",
> biomart = ensembl46)
> --------
>
> My output:
>
>  > library(GenomeGraphs)
> Loading required package: biomaRt
> Loading required package: grid
>  >
>  > mart=useMart(biomart="ensembl", dataset="mmusculus_gene_ensembl")
> Checking attributes ... ok
> Checking filters ... ok
>  > ds <- listDatasets(mart)
>  > ds[grep("mus",ds$desc),]
>                    dataset                  description version
> 43 mmusculus_gene_ensembl Mus musculus genes (NCBIM37) NCBIM37
>  >
>  > # RPLP1 on mm8 (i.e. not recent) build
>  > # this will run, but obviously won't find my gene
>  > gr <- new("GeneRegion", chromosome = "9",
> +                    start = 61711290, end = 61712548, strand="-",
> biomart = mart)
>  >
>  > print(gr)
> Object of class 'GeneRegion':
>   Start:61709290
>   End:61714548
>   Chromosome: 9
>   Exons in Ensembl:
>     ensembl_gene_id ensembl_transcript_id ensembl_exon_id
> exon_chrom_start
> NA            <NA>                  <NA>            <NA>
> <NA>
>     exon_chrom_end rank strand biotype
> NA           <NA> <NA>   <NA>    <NA>
>
>   There are 0 more rows>
>  >
>  > # try the archived version
>  > ensembl46=useMart(biomart="ensembl_mart_46",
> dataset="mmusculus_gene_ensembl", archive=TRUE)
> Checking attributes ... ok
> Checking filters ... ok
> Warning messages:
> 1: In bmAttrFilt("attributes", mart) :
>    biomaRt warning: looks like we're connecting to an older version of
> BioMart suite. Some biomaRt functions might not work.
> 2: In bmAttrFilt("filters", mart) :
>    biomaRt warning: looks like we're connecting to an older version of
> BioMart suite. Some biomaRt functions might not work.
>  > ds46 <- listDatasets(ensembl46)
>  > ds46[grep("mus",ds46$desc),]
>                    dataset                  description version
> 34 mmusculus_gene_ensembl Mus musculus genes (NCBIM36) NCBIM36
>  >
>  > gr46 <- new("GeneRegion", chromosome = "9",
> +                    start = 61711290, end = 61712548, strand="-",
> biomart = ensembl46)
> Error in getBM(c("ensembl_gene_id", "ensembl_transcript_id",
> "ensembl_exon_id",  :
>    Invalid attribute(s): ensembl_exon_id
> Please use the function 'listAttributes' to get valid attribute names
>  >
>  > sessionInfo()
> R version 2.9.0 (2009-04-17)
> i386-apple-darwin8.11.1
>
> locale:
> en_AU.UTF-8/en_AU.UTF-8/C/C/en_AU.UTF-8/en_AU.UTF-8
>
> attached base packages:
> [1] grid      stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
> [1] GenomeGraphs_1.3.5 biomaRt_2.0.0
>
> loaded via a namespace (and not attached):
> [1] RCurl_0.94-1 XML_2.3-0
>
>
> Thanks,
> Mark
>
>
>
> ------------------------------
> Mark Robinson, PhD (Melb)
> Epigenetics Laboratory, Garvan
> Bioinformatics Division, WEHI
> e: m.robinson at garvan.org.au
> e: mrobinson at wehi.edu.au
> p: +61 (0)3 9345 2628
> f: +61 (0)3 9347 0852
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list