[BioC] SNP to Gene mapping (GWAS)

Vincent Carey stvjc at channing.harvard.edu
Wed Oct 21 21:33:53 CEST 2009


It is not really a well-posed question, but various resources exist.

If you have rs numbers for SNPs, you can determine their genomic
coordinates using the SNPlocs.Hsapiens.dbSNP.* metadata packages --
the latest uses build 130.  You can then use the org.Hs.eg.db package
to determine CHR, CHRLOC and CHRLOCEND using the Entrez definitions.

 A rather limited solution can be had if you put the following in your workspace

rs2eg = function (rsid, chr, radius = 1e+05)
{
    require(org.Hs.eg.db)
    if (!exists("getSNPlocs"))
        stop("you need a SNPlocs.Hsapiens.dbSNP.* package loaded")
    if (length(grep("chr", chr)) <= 0)
        stop("chr must be in form \"chr...\"")
    numid = as.numeric(gsub("rs", "", rsid))
    snlocs = getSNPlocs(chr)
    loc = snlocs[snlocs[, 1] == numid, "loc"]
    cstr = gsub("chr", "", chr)
    if (!(cstr %in% keys(revmap(org.Hs.egCHR))))
        stop("your chr is bad; check keys(org.Hs.egCHR)")
    gonc = get(cstr, revmap(org.Hs.egCHR))
    allloc = mget(gonc, org.Hs.egCHRLOC)
    disamb = sapply(allloc, function(x) abs(as.numeric(x)[1]))
    allloc = na.omit(disamb)
    ok = which(abs(loc - allloc) <= radius)
    if (length(ok) > 0)
        return(names(allloc)[ok])
    return(NULL)
}

> rs2eg("rs6060535", "chr20")
[1] "6676"   "8904"   "9054"   "9584"   "10137"  "80307"  "140823"

These are genes that have CHRLOC address within 100kb of the snp rs6060535.
Other resources for genomic annotation are emerging in the
IRanges/rtracklayer packages and these could surely be relevant for
large-scale solutions.  Note that the program given here is not
vectorized but it would not be hard to extend it to something that is.

Of course if you find attributes that are more pertinent to your
question in the biomaRt facility, that might be better.  Let's check:

> mart = useMart("snp", dataset="hsapiens_snp")
Checking attributes ... ok
Checking filters ... ok
> getBM(mart=mart, filters="refsnp", value="rs6060535",
+   attr=c("chr_name", "chrom_start", "associated_gene",
"ensembl_gene_stable_id"))
  chr_name chrom_start associated_gene ensembl_gene_stable_id
1       20    34235522              NA        ENSG00000214078
2       20    34235522              NA        ENSG00000222460
3       20    34235522              NA        ENSG00000244462
> mget(.Last.value[,4], org.Hs.egENSEMBL2EG, ifnotfound=NA)
$ENSG00000214078
[1] "8904"  "9054"  "10137"

$ENSG00000222460
[1] NA

$ENSG00000244462
[1] NA

This result is compatible with the rs2eg function in that it
identifies stably named ensembl genes (but no 'associated' genes, sic)
that coincide with the entrez genes found by my distance-parameterized
search.

> sessionInfo()
R version 2.10.0 beta (2009-10-14 r50081)
i386-apple-darwin9.8.0

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices datasets  tools     utils     methods
[8] base

other attached packages:
[1] org.Hs.eg.db_2.3.6   RSQLite_0.7-3        DBI_0.2-4
[4] AnnotationDbi_1.7.20 Biobase_2.5.8        biomaRt_2.1.0
[7] weaver_1.11.1        codetools_0.2-2      digest_0.4.1

loaded via a namespace (and not attached):
[1] RCurl_1.2-1 XML_2.6-0



On Wed, Oct 21, 2009 at 2:54 PM, Tim Smith <tim_smith_666 at yahoo.com> wrote:
> Hi,
>
> I had a list of SNPs that I wanted to map to genes. Is there any package in bioconductor that will let me do this? I looked at GenABEL, but it doesn't seem that I can do it with this package.
> thanks!
>
>
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list