[BioC] SNP to Gene mapping (GWAS)

Vincent Carey stvjc at channing.harvard.edu
Sun Oct 25 16:56:40 CET 2009


I don't quite follow this question.  Perhaps folks more expert in
annotation and versioning can comment.  It seems you would want to
swap out the CHRLOC information in my function for something else, and
various things that are quite new in the Biostrings/GenomicFeatures
packages should help with increasing flexibility for this part of the
task.  Some sort of very high level computation like an intersection
of a snp set and a gene set should be possible to meet your objective,
but I don't have the constructs under my belt yet.

On Sun, Oct 25, 2009 at 9:19 AM, Tim Smith <tim_smith_666 at yahoo.com> wrote:
> Thanks for that Vincent.
> The SNPs that I am using are associated with the NCBI build 36 (UCSC -
> hg18). However, if I use the 'org.Hs.eg.db' package, I might not get the
> right NCBI build.
> I had used the biomaRt package to get an archived build before, but I am not
> sure how to go about it for the code you suggested.
> thanks again!
> ________________________________
> From: Vincent Carey <stvjc at channing.harvard.edu>
> To: Tim Smith <tim_smith_666 at yahoo.com>
> Cc: bioc <bioconductor at stat.math.ethz.ch>
> Sent: Wed, October 21, 2009 3:33:53 PM
> Subject: Re: [BioC] SNP to Gene mapping (GWAS)
>
> It is not really a well-posed question, but various resources exist.
>
> If you have rs numbers for SNPs, you can determine their genomic
> coordinates using the SNPlocs.Hsapiens.dbSNP.* metadata packages --
> the latest uses build 130.  You can then use the org.Hs.eg.db package
> to determine CHR, CHRLOC and CHRLOCEND using the Entrez definitions.
>
> A rather limited solution can be had if you put the following in your
> workspace
>
> rs2eg = function (rsid, chr, radius = 1e+05)
> {
>     require(org.Hs.eg.db)
>     if (!exists("getSNPlocs"))
>         stop("you need a SNPlocs.Hsapiens.dbSNP.* package loaded")
>     if (length(grep("chr", chr)) <= 0)
>         stop("chr must be in form \"chr...\"")
>     numid = as.numeric(gsub("rs", "", rsid))
>     snlocs = getSNPlocs(chr)
>     loc = snlocs[snlocs[, 1] == numid, "loc"]
>     cstr = gsub("chr", "", chr)
>     if (!(cstr %in% keys(revmap(org.Hs.egCHR))))
>         stop("your chr is bad; check keys(org.Hs.egCHR)")
>     gonc = get(cstr, revmap(org.Hs.egCHR))
>     allloc = mget(gonc, org.Hs.egCHRLOC)
>     disamb = sapply(allloc, function(x) abs(as.numeric(x)[1]))
>     allloc = na.omit(disamb)
>     ok = which(abs(loc - allloc) <= radius)
>     if (length(ok) > 0)
>         return(names(allloc)[ok])
>     return(NULL)
> }
>
>> rs2eg("rs6060535", "chr20")
> [1] "6676"  "8904"  "9054"  "9584"  "10137"  "80307"  "140823"
>
> These are genes that have CHRLOC address within 100kb of the snp rs6060535.
> Other resources for genomic annotation are emerging in the
> IRanges/rtracklayer packages and these could surely be relevant for
> large-scale solutions.  Note that the program given here is not
> vectorized but it would not be hard to extend it to something that is.
>
> Of course if you find attributes that are more pertinent to your
> question in the biomaRt facility, that might be better.  Let's check:
>
>> mart = useMart("snp", dataset="hsapiens_snp")
> Checking attributes ... ok
> Checking filters ... ok
>> getBM(mart=mart, filters="refsnp", value="rs6060535",
> +  attr=c("chr_name", "chrom_start", "associated_gene",
> "ensembl_gene_stable_id"))
>   chr_name chrom_start associated_gene ensembl_gene_stable_id
> 1      20    34235522              NA        ENSG00000214078
> 2      20    34235522              NA        ENSG00000222460
> 3      20    34235522              NA        ENSG00000244462
>> mget(.Last.value[,4], org.Hs.egENSEMBL2EG, ifnotfound=NA)
> $ENSG00000214078
> [1] "8904"  "9054"  "10137"
>
> $ENSG00000222460
> [1] NA
>
> $ENSG00000244462
> [1] NA
>
> This result is compatible with the rs2eg function in that it
> identifies stably named ensembl genes (but no 'associated' genes, sic)
> that coincide with the entrez genes found by my distance-parameterized
> search.
>
>> sessionInfo()
> R version 2.10.0 beta (2009-10-14 r50081)
> i386-apple-darwin9.8.0
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats    graphics  grDevices datasets  tools    utils    methods
> [8] base
>
> other attached packages:
> [1] org.Hs.eg.db_2.3.6  RSQLite_0.7-3        DBI_0.2-4
> [4] AnnotationDbi_1.7.20 Biobase_2.5.8        biomaRt_2.1.0
> [7] weaver_1.11.1        codetools_0.2-2      digest_0.4.1
>
> loaded via a namespace (and not attached):
> [1] RCurl_1.2-1 XML_2.6-0
>
>
>
> On Wed, Oct 21, 2009 at 2:54 PM, Tim Smith <tim_smith_666 at yahoo.com> wrote:
>> Hi,
>>
>> I had a list of SNPs that I wanted to map to genes. Is there any package
>> in bioconductor that will let me do this? I looked at GenABEL, but it
>> doesn't seem that I can do it with this package.
>> thanks!
>>
>>
>>
>>        [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>



More information about the Bioconductor mailing list