[BioC] rtracklayer and gene symbols

James W. MacDonald jmacdon at med.umich.edu
Thu Jul 16 17:36:54 CEST 2009


Hi Christian,

Christian Ruckert wrote:
> Is there an elegant way to find the chromosome, start and end position 
> to a given gene symbol via rtracklayer.

I don't know about using rtracklayer, but there are any number of ways 
to get these data. If you want directly from UCSC, you can query their 
MySQL server directly:

 > library(RMySQL)
Loading required package: DBI
 > con <- dbConnect("MySQL", user = "genome", host = 
"genome-mysql.cse.ucsc.edu", dbname = "hg18")
 > gns <- c("BRIP1","VEGFA","FANCB","TP53")
 > sql <- paste("select name2, txStart, txEnd from refGene where name2 
in ('",
+ paste(gns, collapse = "','"), "');", sep = "")
 > dbGetQuery(con, sql)
    name2  txStart    txEnd
1  BRIP1 57114766 57295537
2  FANCB 14771449 14801105
3  FANCB 14771449 14801105
4   TP53  7512444  7531588
5   TP53  7512444  7531588
6   TP53  7512444  7519536
7   TP53  7512444  7519536
8   TP53  7512444  7519536
9   TP53  7512444  7531588
10  TP53  7512444  7531588
11 VEGFA 43845930 43862201
12 VEGFA 43845930 43862201
13 VEGFA 43845930 43862201
14 VEGFA 43845930 43862201
15 VEGFA 43845930 43862201
16 VEGFA 43845930 43862201
17 VEGFA 43845930 43862201

Or you could use the org.Hs.eg.db package supplied by BioC:

 > library(org.Hs.eg.db)
 > egs <- unlist(mget(gns, revmap(org.Hs.egSYMBOL)))
 > egs
   BRIP1   VEGFA   FANCB    TP53
"83990"  "7422"  "2187"  "7157"
 > starts <- unlist(mget(egs, org.Hs.egCHRLOC))
 > ends <- unlist(mget(egs, org.Hs.egCHRLOCEND))
## two end locations for TP53, so double up the symbol
 > data.frame(gns=gns[c(1:4,4)], starts, ends)
     gns    starts      ends
1 BRIP1 -57114766 -57295537
2 VEGFA  43845930  43862201
3 FANCB -14771449 -14801105
4  TP53  -7512444  -7531588
5  TP53  -7512444  -7519536

Or you could use biomaRt:

 > library(biomaRt)
 > mart <- useMart("ensembl", "hsapiens_gene_ensembl")
Checking attributes ... ok
Checking filters ... ok
 > getBM(c("hgnc_symbol","start_position","end_position"), 
"hgnc_symbol", gns, mart)
   hgnc_symbol start_position end_position
1       FANCB       14861529     14891184
2        TP53        7565257      7590863
3       VEGFA       43737948     43754224
4       BRIP1       59759985     59940755

Best,

Jim


> 
> In the table browser on USCS website I can provide these information by 
> pasting a list of identifiers, so the requested information must be 
> somewhere in the tables.
> 
> My found solution is kind of indirect by first getting a table of all 
> UCSC names together with gene symbols, finding the corresponding UCSC 
> names to my symbols and then searching these UCSC names in a table of 
> all UCSC names with location.
> 
> Thank you in advance,
> Christian
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826



More information about the Bioconductor mailing list