[BioC] SNP chip
Adrian Johnson
oriolebaltimore at gmail.com
Thu Jul 8 23:26:30 CEST 2010
Hi Jim,
the idea is to compare the snp chip data to SNP calls derived from
sequencing data. For the same samples, I have both SNP chip and
sequencing data.
To estimate the sensitivity and specificity, I want to compare the SNP
chip data to sequencing data.
>From sequencing data, I have a pileup-format file in a BED format:
Chromosome From To Reference Base SNP call
I have no experience working with SNP chip data. All I have is the
data that looks like
SNPID Call
SNP_A-8373748 BB
SNP_A-2210818 AB
SNP_A-4290346 BB
SNP_A-2219708 AA
There are many questions:
1. I need to know where the SNP ID is Chromosome and position. From
this I can deduce from what base it is on genome.
2. Convert what BB,AB and AA mean.
Thanks
Adrian
On Thu, Jul 8, 2010 at 4:41 PM, James W. MacDonald
<jmacdon at med.umich.edu> wrote:
> Hi Adrian,
>
> On 7/8/2010 12:59 PM, Adrian Johnson wrote:
>>
>> Hi:
>>
>>
>> I have a snp array 6 (affymetrix) data table that looks like the
>> following:
>>
>>
>> SNPID Call
>> SNP_A-8373748 BB
>> SNP_A-2210818 AB
>> SNP_A-4290346 BB
>> SNP_A-2219708 AA
>>
>>
>> I want to be able to convert this data into a .BED file like format
>> that will look like following:
>
> Well, doing what you want will be difficult, as I don't know how you are
> going from genotype to 'Reference Base' and 'Call', nor do I really know
> what you mean by either of those in this situation. Anyway, that's a YP, not
> an MP (for you Boogie Nights fans out there ;-D).
>
> Here is I think what you need to know to get close to what you are trying to
> do.
>
>> library(pd.genomewidesnp.6)
>> con <- db(pd.genomewidesnp.6)
> ## fake up some IDs
>> ids <- dbGetQuery(con, "select man_fsetid from featureSet limit 20;")
>> ids
> man_fsetid
> 1 SNP_A-2131660
> 2 SNP_A-1967418
> 3 SNP_A-1969580
> 4 SNP_A-4263484
> 5 SNP_A-1978185
> 6 SNP_A-4264431
> 7 SNP_A-1980898
> 8 SNP_A-1983139
> 9 SNP_A-4265735
> 10 SNP_A-1995832
> 11 SNP_A-1995893
> 12 SNP_A-1997689
> 13 SNP_A-1997709
> 14 SNP_A-1997896
> 15 SNP_A-1997922
> 16 SNP_A-2000230
> 17 SNP_A-2000332
> 18 SNP_A-2000337
> 19 SNP_A-2000342
> 20 SNP_A-4268173
>
> ## now a simple SQL query
>
>> dbGetQuery(con, paste("select chrom, physical_pos, allele_a, allele_b from
>> featureSet where man_fsetid in ('", paste(ids[,1], collapse="','"), "');",
>> sep = ""))
> chrom physical_pos allele_a allele_b
> 1 1 2224111 A G
> 2 1 2319424 A G
> 3 1 2926730 C T
> 4 1 3084986 C G
> 5 1 3155127 A C
> 6 1 3695086 C G
> 7 1 3710825 A G
> 8 1 3753024 A G
> 9 1 3753427 A G
> 10 1 3756100 A G
> 11 1 3756146 A C
> 12 1 4240737 A G
> 13 1 4243294 C G
> 14 1 4243405 A C
> 15 1 4243441 C T
> 16 1 1145994 C T
> 17 1 2543484 C T
> 18 1 2941694 C T
> 19 1 3292731 C T
> 20 1 4276892 C T
>
> If you don't know any SQL, note that there are a mixture of " and ' in that
> paste statement, as we want to end up with a query that looks like this:
>
> "select chrom, physical_pos, allele_a, allele_b from featureSet where
> man_fsetid in
> ('SNP_A-2131660','SNP_A-1967418','SNP_A-1969580','SNP_A-4263484','SNP_A-1978185','SNP_A-4264431','SNP_A-1980898','SNP_A-1983139','SNP_A-4265735','SNP_A-1995832','SNP_A-1995893','SNP_A-1997689','SNP_A-1997709','SNP_A-1997896','SNP_A-1997922','SNP_A-2000230','SNP_A-2000332','SNP_A-2000337','SNP_A-2000342','SNP_A-4268173');"
>
> Best,
>
> Jim
>
>
>
>>
>> Chromosome Position Reference Base Call
>> chr19 2094894 A T
>> chr19 2095300 G A
>>
>>
>> Is it possible through bioconductor? Thanks for your time.
>>
>> -Adrian
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> Douglas Lab
> University of Michigan
> Department of Human Genetics
> 5912 Buhl
> 1241 E. Catherine St.
> Ann Arbor MI 48109-5618
> 734-615-7826
> **********************************************************
> Electronic Mail is not secure, may not be read every day, and should not be
> used for urgent or sensitive issues
>
More information about the Bioconductor
mailing list