[BioC] building a SNPlocs data package [was: other human genomes, other SNP sets?]

Paul Shannon pshannon at systemsbiology.org
Thu Mar 5 14:13:48 CET 2009


'How to forge a BSgenome data package' answered my first question.

Is there a comparable write up for building a SNPlocs data package?

All I have found  is Herve's comments to Praveen on Feb 12 2009:

> The information in SNPlocs.Hsapiens.dbSNP.20071016 was retrieved
> from dbSNP, from this location to be precise:
>
>    ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/ASN1_flat/

That flat file has snp information from other assemblies -- celera and  
HuRef, and that is some of the information I want.
(See example below for one snp from ORM1 on chromosome 9.)

If I want this level of detail, should I parse the original file myself?
Are the parsing code and instructions for building a SNPlocs available?

Thanks,

  - Paul


    rs1766074|human|9606|snp|genotype=NO|submitterlink=YES|updated  
2004-10-04 13:51
    ss2622917|SC_JCM|AL356796.4_74652|orient=+|ss_pick=YES
    SNP|alleles='C/T'|het=?|se(het)=?
    VAL|validated=NO|min_prob=?|max_prob=?|notwithdrawn

    CTG|assembly=Celera|chr=9|chr-pos=87735017|NW_924573.1|ctg- 
start=1222848|ctg-end=1222848|loctype=2|orient=-
    LOC|ORM1|locus_id=5004|fxn-class=coding-synonymous|allele=A| 
frame=3|residue=E|aa_position=149|mrna_acc=NM_000607.2| 
prot_acc=NP_000598.2
    LOC|ORM1|locus_id=5004|fxn-class=reference|allele=G|frame=3| 
residue=E|aa_position=149|mrna_acc=NM_000607.2|prot_acc=NP_000598.2

    CTG|assembly=HuRef|chr=9|chr-pos=86692957|NW_001839236.2|ctg- 
start=2686784|ctg-end=2686784|loctype=2|orient=+
    LOC|ORM1|locus_id=5004|fxn-class=coding-synonymous|allele=A| 
frame=3|residue=E|aa_position=141|mrna_acc=NM_000607.2| 
prot_acc=NP_000598.2
    LOC|ORM1|locus_id=5004|fxn-class=reference|allele=G|frame=3| 
residue=E|aa_position=141|mrna_acc=NM_000607.2|prot_acc=NP_000598.2

    CTG|assembly=reference|chr=9|chr-pos=116127163|NT_008470.18|ctg- 
start=24408547|ctg-end=24408547|loctype=2|orient=-
    LOC|ORM1|locus_id=5004|fxn-class=coding-synonymous|allele=A| 
frame=3|residue=E|aa_position=149|mrna_acc=NM_000607.2| 
prot_acc=NP_000598.2
    LOC|ORM1|locus_id=5004|fxn-class=reference|allele=G|frame=3| 
residue=E|aa_position=149|mrna_acc=NM_000607.2|prot_acc=NP_000598.2




On Mar 4, 2009, at 5:12 AM, Paul Shannon wrote:

> Has anyone created a BSgenome object from the Ventner (HuRef),  
> Watson, or other recently completed sequencing projects?  Or SNPlocs  
> data packages for these genomes?
>
> If not, can you offer any advice or cautions to me as I attempt to  
> do so myself?
>
> Thanks -
>
> - Paul Shannon
>   Institute for Systems Biology
>   Seattle



More information about the Bioconductor mailing list