[BioC] SNP chip

Adrian Johnson oriolebaltimore at gmail.com
Thu Jul 8 23:26:30 CEST 2010


Hi Jim,

the idea is to compare the snp chip data to SNP calls derived from
sequencing data. For the same samples, I have both SNP chip and
sequencing data.
To estimate the sensitivity and specificity, I want to compare the SNP
chip data to sequencing data.

>From sequencing data, I have a pileup-format file in a BED format:
Chromosome  From To  Reference Base    SNP call


I have no experience working with SNP chip data. All I have is the
data that looks like
SNPID                   Call
SNP_A-8373748   BB
SNP_A-2210818   AB
SNP_A-4290346   BB
SNP_A-2219708   AA


There are many questions:
1. I need to know where the SNP ID is Chromosome and position. From
this I can deduce from what base it is on genome.
2. Convert what BB,AB and AA mean.


Thanks
Adrian












On Thu, Jul 8, 2010 at 4:41 PM, James W. MacDonald
<jmacdon at med.umich.edu> wrote:
> Hi Adrian,
>
> On 7/8/2010 12:59 PM, Adrian Johnson wrote:
>>
>> Hi:
>>
>>
>> I have a snp array 6 (affymetrix) data table that looks like the
>> following:
>>
>>
>> SNPID                   Call
>> SNP_A-8373748   BB
>> SNP_A-2210818   AB
>> SNP_A-4290346   BB
>> SNP_A-2219708   AA
>>
>>
>> I want to be able to convert this data into a .BED file like format
>> that will look like following:
>
> Well, doing what you want will be difficult, as I don't know how you are
> going from genotype to 'Reference Base' and 'Call', nor do I really know
> what you mean by either of those in this situation. Anyway, that's a YP, not
> an MP (for you Boogie Nights fans out there ;-D).
>
> Here is I think what you need to know to get close to what you are trying to
> do.
>
>> library(pd.genomewidesnp.6)
>> con <- db(pd.genomewidesnp.6)
> ## fake up some IDs
>> ids <- dbGetQuery(con, "select man_fsetid from featureSet limit 20;")
>> ids
>      man_fsetid
> 1  SNP_A-2131660
> 2  SNP_A-1967418
> 3  SNP_A-1969580
> 4  SNP_A-4263484
> 5  SNP_A-1978185
> 6  SNP_A-4264431
> 7  SNP_A-1980898
> 8  SNP_A-1983139
> 9  SNP_A-4265735
> 10 SNP_A-1995832
> 11 SNP_A-1995893
> 12 SNP_A-1997689
> 13 SNP_A-1997709
> 14 SNP_A-1997896
> 15 SNP_A-1997922
> 16 SNP_A-2000230
> 17 SNP_A-2000332
> 18 SNP_A-2000337
> 19 SNP_A-2000342
> 20 SNP_A-4268173
>
> ## now a simple SQL query
>
>> dbGetQuery(con, paste("select chrom, physical_pos, allele_a, allele_b from
>> featureSet where man_fsetid in ('", paste(ids[,1], collapse="','"), "');",
>> sep = ""))
>   chrom physical_pos allele_a allele_b
> 1      1      2224111        A        G
> 2      1      2319424        A        G
> 3      1      2926730        C        T
> 4      1      3084986        C        G
> 5      1      3155127        A        C
> 6      1      3695086        C        G
> 7      1      3710825        A        G
> 8      1      3753024        A        G
> 9      1      3753427        A        G
> 10     1      3756100        A        G
> 11     1      3756146        A        C
> 12     1      4240737        A        G
> 13     1      4243294        C        G
> 14     1      4243405        A        C
> 15     1      4243441        C        T
> 16     1      1145994        C        T
> 17     1      2543484        C        T
> 18     1      2941694        C        T
> 19     1      3292731        C        T
> 20     1      4276892        C        T
>
> If you don't know any SQL, note that there are a mixture of " and ' in that
> paste statement, as we want to end up with a query that looks like this:
>
> "select chrom, physical_pos, allele_a, allele_b from featureSet where
> man_fsetid in
> ('SNP_A-2131660','SNP_A-1967418','SNP_A-1969580','SNP_A-4263484','SNP_A-1978185','SNP_A-4264431','SNP_A-1980898','SNP_A-1983139','SNP_A-4265735','SNP_A-1995832','SNP_A-1995893','SNP_A-1997689','SNP_A-1997709','SNP_A-1997896','SNP_A-1997922','SNP_A-2000230','SNP_A-2000332','SNP_A-2000337','SNP_A-2000342','SNP_A-4268173');"
>
> Best,
>
> Jim
>
>
>
>>
>> Chromosome     Position        Reference Base       Call
>> chr19                2094894             A                        T
>> chr19                2095300             G                        A
>>
>>
>> Is it possible through bioconductor?  Thanks for your time.
>>
>> -Adrian
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> Douglas Lab
> University of Michigan
> Department of Human Genetics
> 5912 Buhl
> 1241 E. Catherine St.
> Ann Arbor MI 48109-5618
> 734-615-7826
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and should not be
> used for urgent or sensitive issues
>



More information about the Bioconductor mailing list