[BioC] dbSNP build for R package SNPlocs.Hsapiens.dbSNP.20080617

Lin Tang ltang at scmmlab.com
Thu Jun 4 20:04:53 CEST 2009


Thanks all for the discussion. Really looking forward for the updated package!

Lin

-----Original Message-----
From: Hervé Pagès [mailto:hpages at fhcrc.org] 
Sent: Thursday, June 04, 2009 10:59 AM
To: James W. MacDonald
Cc: Lin Tang; bioconductor
Subject: Re: [BioC] dbSNP build for R package SNPlocs.Hsapiens.dbSNP.20080617

Hi Jim,

James W. MacDonald wrote:
> Hi Herve,
> 
> I've been dealing with these data myself recently, and can confirm that 
> the data in March were build 129. They put the build 130 data up in 
> early May.
> 
> As a side note, build 129 is known to be problematic, as there are 
> multiple RS numbers that map to the same location:
> 
> http://www.ncbi.nlm.nih.gov/mailman/pipermail/dbsnp-announce/2008q2/000082.html 
> 

Indeed:

   > library(SNPlocs.Hsapiens.dbSNP.20080617)
   > data(chr1_snplocs)
   > sum(duplicated(chr1_snplocs$loc))
   [1] 413
   > which(duplicated(chr1_snplocs$loc))[1:10]
    [1]  2822  3030  9547 10865 12604 12641 16854 17898 21175 21977
   > chr1_snplocs[chr1_snplocs$loc == chr1_snplocs$loc[2822], ]
        RefSNP_id alleles_as_ambig     loc
   2821   3766175                D 1476802
   2822  59009700                W 1476802

Something that puzzled me when I first started to work on the SNPlocs.*
packages (I saw this in Build 128 too).

> 
> According to their help team, this problem has been resolved in build 130.

Good. I'll make a new SNPlocs.Hsapiens.dbSNP.* from this build.

Thanks!
H.

> 
> Best,
> 
> Jim
> 
> 
> 
> Hervé Pagès wrote:
>> Hi Lin,
>>
>> I'm cc'ing the BioC list so other users might benefit from this.
>>
>> Lin Tang wrote:
>>> Dear Dr. Pages,
>>>
>>>  
>>>
>>>
>>>   I am using R package SNPlocs.Hsapiens.dbSNP.20080617 currently, I want
>>>   to check with you that whether this package corresponds to dbSNP build
>>>   129 ? Although from the release date of this R package which is two
>>>   months after the release of dbSNP build 129, it is logical to be so. I
>>>   want to have it confirmed from you. I'd appreciate your kind reply on
>>>   this. Thanks!
>>
>> It's hard to tell.
>>
>> According to these pages:
>>   
>> http://www.ncbi.nlm.nih.gov/mailman/pipermail/dbsnp-announce/2008q2/000081.html 
>>
>>   http://www.ncbi.nlm.nih.gov/projects/SNP/buildhistory.cgi
>> Build 129 was released in April 2008 (note that the exact dates found 
>> on these
>> 2 pages don't match).
>>
>> A similar research shows that Build 130 was released about 1 month ago.
>>
>> So at the time I downloaded the ds_flat_ch*.flat files from here
>>   ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/ASN1_flat
>> in order to build SNPlocs.Hsapiens.dbSNP.20080617 (that was in March
>> 2009), I assume that these files were a dump from Build 129.
>>
>> Note that the files under
>>   ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/ASN1_flat
>> can change at anytime (and today they are indeed different from what they
>> were back in March). It's a sad thing that the SNP team at NCBI doesn't
>> provide permanent URLs for their past builds. And it doesn't help that
>> the ds_flat_ch*.flat files they provide don't contain any information
>> about the build that they're coming from.
>>
>> Anyway, in the future I'll put the Build information in the DESCRIPTION
>> file of the SNPlocs packages.
>>
>> One last note. According to the SNP team at NCBI "Human SNPs in Build 129
>> are mapping to NCBI build 36.3". That is, to our 
>> BSgenome.Hsapiens.UCSC.hg18
>> package. According to UCSC, hg18 is NCBI Build 36.1 but NCBI Build 
>> 36.1 and
>> NCBI Build 36.3 are identical from a *sequence* point of view (I think 
>> what
>> makes them different are the annotations provided by NCBI).
>> This means that, if you are planning to inject 
>> SNPlocs.Hsapiens.dbSNP.20080617
>> in a genome, it only makes sense to do it with 
>> BSgenome.Hsapiens.UCSC.hg18.
>>
>> In the future we will put in place a mechanism to make this injection 
>> safer
>> i.e. check that the injected stuff and the host are compatible.
>>
>> Cheers,
>> H.
>>
>>
>>>
>>>
>>>   Regards,
>>>
>>> Lin Tang, Ph.D.
>>>
>>> Scientist , Informatics | Sequenom  Inc.
>>>
>>> T: 1 858 202 9106 | F: 1 858 202 9084 | E: ltang at sequenom.com
>>>
>>>  
>>>
>>>
>>>
>>> THIS EMAIL MESSAGE IS FOR THE SOLE USE OF THE INTENDED RECIPIENT(S) 
>>> AND MAY CONTAIN CONFIDENTIAL INFORMATION. ANY UNAUTHORIZED REVIEW, 
>>> USE, DISCLOSURE OR DISTRIBUTION IS PROHIBITED. IF YOU ARE NOT THE 
>>> INTENDED RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY EMAIL AND 
>>> DESTROY ALL COPIES OF THE ORIGINAL MESSAGE.
>>>
>>
> 

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list