[BioC] Retrieving gene name where given genomic region is included.

Vincent Carey stvjc at channing.harvard.edu
Tue Jan 19 04:20:15 CET 2010


This looks like a job for GenomicFeatures and IRanges.  There are
various approaches, but this seems to help with your specific example.
 Get your software into a shape like the one indicated in the
sessionInfo below, and then do

library(GenomicFeatures)
library(GenomicFeatures.Hsapiens.UCSC.hg18)
genes = geneHuman()
g10 = genes[genes$chrom=="chr10",]
dim(g10)
g10i = RangedData(IRanges(start=g10$txStart, end=g10$txEnd),
space="chr10", name=g10$name)
g10i
boel = IRanges(start=17317394, end=17317851)  # could be richer
findOverlaps(boel, ranges(g10i)[["chr10"]])
g10i[298:307,]
kgn = unique(g10i[298:307,]$name)
mget(kgn, revmap(org.Hs.egUCSCKG))

to find

> get("7431", org.Hs.egGENENAME)
[1] "vimentin"



> sessionInfo()
R version 2.10.1 RC (2009-12-10 r50697)
i386-apple-darwin9.8.0

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices datasets  tools     utils     methods
[8] base

other attached packages:
 [1] org.Hs.eg.db_2.3.6
 [2] RSQLite_0.7-3
 [3] DBI_0.2-5
 [4] AnnotationDbi_1.7.20
 [5] Biobase_2.5.8
 [6] IRanges_1.5.16
 [7] GenomicFeatures.Hsapiens.UCSC.hg18_0.1.0
 [8] GenomicFeatures_0.1.4
 [9] rtracklayer_1.7.2
[10] RCurl_1.2-1
[11] bitops_1.0-4.1
[12] weaver_1.11.1
[13] codetools_0.2-2
[14] digest_0.4.1

loaded via a namespace (and not attached):
 [1] BSgenome_1.15.3   Biostrings_2.15.5 MASS_7.3-4        XML_2.6-0
 [5] annotate_1.23.4   globaltest_5.1.1  multtest_2.3.0    splines_2.10.1
 [9] survival_2.35-7   xtable_1.5-5



On Mon, Jan 18, 2010 at 12:28 PM, Boel Brynedal <Boel.Brynedal at ki.se> wrote:
> Dear List,
>
> I have long lists of genomic regions (chr;start;end) where a given event
> has taken place. These regions can be an exon, an intronic region, or
> similar.  Most (all) of these events have taken place within the
> boundaries of genes, and I would like to retrieve the gene names
> (ensemble ID).
>
> I've tried to use biomaRt:
>>
> getBM(attributes=c("ensembl_gene_id"),filter=c("chromosome_name","start","end"),
> values=list(10,17317394,17317851), mart=ensembl)
> [1] ensembl_gene_id
> <0 rows> (or 0-length row.names)
>
> But since no whole GENE is within these boundaries, I get nothing. i've
> also tried asking for "ensembl_exon_id" when looking at exon events (not
> all of them are of that kind however), and this generally results in a
> long list of exon IDs (because one exon can be part of several transcripts).
>
> I would appreciate any ideas of how this could be done in a better way.
>
> Thank you!
>
> Boel
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list