[BioC] get chr position for a batch of human SNPs

Hervé Pagès hpages at fhcrc.org
Fri Sep 23 00:19:38 CEST 2011


On 11-09-22 02:38 PM, Tim Triche, Jr. wrote:
> Is it possible to liftOver within R?

Sure. See that thread on the Bioconductor mailing list from
May 2011, and in particular Michael's answer:

   https://stat.ethz.ch/pipermail/bioconductor/2011-May/039254.html

H.

PS: Don't know why the messages in this thread don't show up, only
an URL to the message, so one has to open each message in a separate
window or tab, which is a pain :-/ Maybe the web interface to Mailman
could be configured to not do that?

>
> --t
>
> On Sep 22, 2011, at 2:24 PM, Hervé Pagès<hpages at fhcrc.org>  wrote:
>
>> Hi Shirley,
>>
>> On 11-09-22 11:29 AM, shirley zhang wrote:
>>> Dear All,
>>>
>>> I am planing to map the SNPids to hg18 positions (chr and position)
>>> for a huge list of human snps. I've tried the package
>>> "SNPlocs.Hsapiens.dbSNP.20090506" and have 2 questions regarding this
>>> package:
>>>
>>> 1. Do the SNPs in this package map the hg18 genome (NCBI Build 36.3
>>> with Group Label "reference" instead of "Celera" or "HuRef"?
>>
>> Yes, they are mapped to hg18. See:
>>
>> http://bioconductor.org/packages/release/data/annotation/html/SNPlocs.Hsapiens.dbSNP.20090506.html
>>
>> and the man page of the package for additional details:
>>
>>   >  library(SNPlocs.Hsapiens.dbSNP.20090506)
>>   >  ?SNPlocs.Hsapiens.dbSNP.20090506
>>
>>>
>>> 2. If I don't know the chr information (seqname), can I obtain the
>>> position with dbSNP Id only?
>>
>> Unfortunately, because SNPs are stored in one data frame per
>> chromosome, if you don't know the chr then you need to load and
>> query each data frame individually.
>>
>> With more recent SNPlocs packages (e.g.
>> SNPlocs.Hsapiens.dbSNP.20100427), provision was added to
>> let the user load SNPs from more than 1 chromosome in a single
>> GRanges object, so you can do something like:
>>
>>   ## Load all the SNPs in a big GRanges object (takes about
>>   ## 13 minutes and requires 6GB of RAM!):
>>   all_snps<- getSNPlocs(names(getSNPcount()), as.GRanges=TRUE)
>>
>>   ## Use the rs ids to set the names (takes about 6 minutes):
>>   names(all_snps)<- paste("rs", elementMetadata(all_snps)$RefSNP_id,
>>                            sep="")
>>
>>   ## Then extract your SNPs from the big GRanges object (again,
>>   ## this can take a long time, depending on how many SNPs you
>>   ## extract):
>>   my_rs_ids<- sample(names(all_snps), 1000)
>>   my_snps<- all_snps[my_rs_ids]
>>
>> However, please note that, starting with
>> SNPlocs.Hsapiens.dbSNP.20100427 (i.e. dbSNP Build 131),
>> SNPs are mapped to GRCh37 (UCSC hg19) instead of hg18.
>>
>> Hope this helps,
>> H.
>>
>>>
>>> Further, I find dbSNP batch queries a little more difficult to work
>>> with because they map to different versions of the hg18 like Celera,
>>> HumanRef, etc.Can anybody let me know a better option to get hg18 chr
>>> position with the most popular or confident version of dbSNP?
>>>
>>> Thanks in advance
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fhcrc.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list