[BioC] get chr position for a batch of human SNPs

shirley zhang shirley0818 at gmail.com
Thu Sep 29 15:49:42 CEST 2011


Dear Hever,

Thank you very much. Such a great news! I will try this new function
rsidsToGRanges(). I believe I will use it very often in my work to
extract SNP information for a set of rs ids.

Thanks again,
Shirley

2011/9/28 Hervé Pagès <hpages at fhcrc.org>:
> Hi Shirley,
>
> On 11-09-22 02:24 PM, Hervé Pagès wrote:
>>
>> Hi Shirley,
>>
>> On 11-09-22 11:29 AM, shirley zhang wrote:
>>>
>>> Dear All,
>>>
>>> I am planing to map the SNPids to hg18 positions (chr and position)
>>> for a huge list of human snps. I've tried the package
>>> "SNPlocs.Hsapiens.dbSNP.20090506" and have 2 questions regarding this
>>> package:
>>>
>>> 1. Do the SNPs in this package map the hg18 genome (NCBI Build 36.3
>>> with Group Label "reference" instead of "Celera" or "HuRef"?
>>
>> Yes, they are mapped to hg18. See:
>>
>>
>>
>> http://bioconductor.org/packages/release/data/annotation/html/SNPlocs.Hsapiens.dbSNP.20090506.html
>>
>>
>> and the man page of the package for additional details:
>>
>>  > library(SNPlocs.Hsapiens.dbSNP.20090506)
>>  > ?SNPlocs.Hsapiens.dbSNP.20090506
>>
>>>
>>> 2. If I don't know the chr information (seqname), can I obtain the
>>> position with dbSNP Id only?
>>
>> Unfortunately, because SNPs are stored in one data frame per
>> chromosome, if you don't know the chr then you need to load and
>> query each data frame individually.
>>
>> With more recent SNPlocs packages (e.g.
>> SNPlocs.Hsapiens.dbSNP.20100427), provision was added to
>> let the user load SNPs from more than 1 chromosome in a single
>> GRanges object, so you can do something like:
>>
>> ## Load all the SNPs in a big GRanges object (takes about
>> ## 13 minutes and requires 6GB of RAM!):
>> all_snps <- getSNPlocs(names(getSNPcount()), as.GRanges=TRUE)
>>
>> ## Use the rs ids to set the names (takes about 6 minutes):
>> names(all_snps) <- paste("rs", elementMetadata(all_snps)$RefSNP_id,
>> sep="")
>>
>> ## Then extract your SNPs from the big GRanges object (again,
>> ## this can take a long time, depending on how many SNPs you
>> ## extract):
>> my_rs_ids <- sample(names(all_snps), 1000)
>> my_snps <- all_snps[my_rs_ids]
>>
>> However, please note that, starting with
>> SNPlocs.Hsapiens.dbSNP.20100427 (i.e. dbSNP Build 131),
>> SNPs are mapped to GRCh37 (UCSC hg19) instead of hg18.
>
> I've made some improvements to the SNPlocs packages. The improved
> packages are in BioC 2.9 (current devel, soon to be released) and
> had their versions bumped to 0.99.6. They are:
>
>  SNPlocs.Hsapiens.dbSNP.20090506: dbSNP Build 130, based on hg18
>  SNPlocs.Hsapiens.dbSNP.20100427: dbSNP Build 131, based on hg19
>  SNPlocs.Hsapiens.dbSNP.20101109: dbSNP Build 132, based on hg19
>  SNPlocs.Hsapiens.dbSNP.20110815: dbSNP Build 134, based on hg19
>
> Only the source packages are available at the moment but they should
> be installable on Windows/Mac from R-2.14 with
>
>  biocLite(..., type="source").
>
> The data in those packages have not changed but are organized more
> efficiently. There is a new function rsidsToGRanges() to extract
> SNP information for a set of rs ids. Using rsidsToGRanges() is
> much more memory (1/10) and time efficient (40x) than using the
> code I provided above. See ?rsidsToGRanges for the details.
>
> Let me know if you have questions.
>
> Cheers,
> H.
>
>>
>> Hope this helps,
>> H.
>>
>>>
>>> Further, I find dbSNP batch queries a little more difficult to work
>>> with because they map to different versions of the hg18 like Celera,
>>> HumanRef, etc.Can anybody let me know a better option to get hg18 chr
>>> position with the most popular or confident version of dbSNP?
>>>
>>> Thanks in advance
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>



-- 
Xiaoling (Shirley) Zhang

M.D., Ph.D. (Bioinformatics)
Boston University, Boston, MA
Tel: (857) 233-9862
Email: zhangxl at bu.edu



More information about the Bioconductor mailing list