[BioC] pulling functional information for SNPs

James W. MacDonald jmacdon at med.umich.edu
Wed Apr 28 22:42:28 CEST 2010


Hi Kay,

Kay Jaja wrote:
> Hi ,
> 
> I have a list of SNPS (rs numbers ) and I am interested in pulling the functional data corresponding to each SNP from a data base like ensemble, i.e.( is the gene name if the snp i sin a gene, intron, exon, non_ synonymous snp, or synonymous snp). 
> is it possible to do this in R using BioMart or any other packages?

Do you mean to ask if it is possible, or is it easy? It is certainly 
possible, although it depends on exactly what you want. Your question is 
not as complete as it could be. In the future, you should try to explain 
exactly what you are trying to do rather than asking open-ended questions.

You can get information about SNPs using biomaRt, but the available 
information looks pretty sparse to me when compared to the small list of 
interests you seem to have. But you can look to see what is available 
easily enough:

library(biomaRt)
mart <- useMart("snp","hsapiens_snp")
listAttributes(mart)

There are one or two vignettes that come with biomaRt that should help 
you get started if you like what you see.

I generally don't use biomaRt for this sort of thing, instead preferring 
to hit the UCSC database directly. Note that what I show below might be 
done as easily using the rtracklayer package; you might explore the 
vignettes for that package as well. Anyway, I would use the RMySQL 
package and query directly:

library(RMySQL)
con <- dbConnect("MySQL", host = "genome-mysql.cse.ucsc.edu", dbname = 
"hg18", user = "genome")

## what type of info is available?

 > dbGetQuery(con, "select * from snp129 where name='rs25';")
   bin chrom chromStart chromEnd name score strand refNCBI refUCSC observed
1 673  chr7   11550666 11550667 rs25     0      -       T       T      A/G
   molType  class                             valid    avHet  avHetSE   func
1 genomic single by-cluster,by-frequency,by-hapmap 0.499586 0.014383 intron
   locType weight
1   exact      1

Note two things here. First, you don't know the return order, so you 
should always ask for the database to return what you are querying on 
(this is true of biomaRt as well). Second, if you are querying lots of 
SNPs, just do it in one big query instead of one by one. Repeatedly 
querying an online database will get you banned. So say your rs IDs are 
in a vector rsid, and you want the chromosome, the position, the bases, 
and the function (intron, exon, intragenic, etc).

sql <- paste("select name, chrom, chromEnd, observed, func from snp129 
where name in ('", paste(rsid, collapse = "','"), "');", sep = "")

there are a lot of ' and " in there, because we want something that 
looks like this:

select name, chrom, chromEnd, observed, func from snp129 where name in 
('rs25','rs26','rs27','rs28');

so you want to make sure the sql statement looks OK first. Then just do

dat <- dbGetQuery(con, sql)

 > rsid <- c("rs25","rs26","rs27","rs28")
 > rsid
[1] "rs25" "rs26" "rs27" "rs28"
 > sql <- paste("select name, chrom, chromEnd, observed, func from 
snp129 where name in ('", paste(rsid, collapse = "','"), "');", sep = "")
 > sql
[1] "select name, chrom, chromEnd, observed, func from snp129 where name 
in ('rs25','rs26','rs27','rs28');"
 > z <- dbGetQuery(con, sql)
 > z
   name chrom chromEnd observed   func
1 rs25  chr7 11550667      A/G intron
2 rs26  chr7 11549996    -/A/G intron
3 rs27  chr7 11549750      C/G intron
4 rs28  chr7 11562590      A/G intron

Best,

Jim



> 
> I appreciate your help,
> thanks
> 
> 
>       
> 	[[alternative HTML version deleted]]
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 



More information about the Bioconductor mailing list