[BioC] SNP chip

James W. MacDonald jmacdon at med.umich.edu
Thu Jul 8 22:41:39 CEST 2010


Hi Adrian,

On 7/8/2010 12:59 PM, Adrian Johnson wrote:
> Hi:
>
>
> I have a snp array 6 (affymetrix) data table that looks like the following:
>
>
> SNPID	                Call
> SNP_A-8373748	BB
> SNP_A-2210818	AB
> SNP_A-4290346	BB
> SNP_A-2219708	AA
>
>
> I want to be able to convert this data into a .BED file like format
> that will look like following:

Well, doing what you want will be difficult, as I don't know how you are 
going from genotype to 'Reference Base' and 'Call', nor do I really know 
what you mean by either of those in this situation. Anyway, that's a YP, 
not an MP (for you Boogie Nights fans out there ;-D).

Here is I think what you need to know to get close to what you are 
trying to do.

 > library(pd.genomewidesnp.6)
 > con <- db(pd.genomewidesnp.6)
## fake up some IDs
 > ids <- dbGetQuery(con, "select man_fsetid from featureSet limit 20;")
 > ids
       man_fsetid
1  SNP_A-2131660
2  SNP_A-1967418
3  SNP_A-1969580
4  SNP_A-4263484
5  SNP_A-1978185
6  SNP_A-4264431
7  SNP_A-1980898
8  SNP_A-1983139
9  SNP_A-4265735
10 SNP_A-1995832
11 SNP_A-1995893
12 SNP_A-1997689
13 SNP_A-1997709
14 SNP_A-1997896
15 SNP_A-1997922
16 SNP_A-2000230
17 SNP_A-2000332
18 SNP_A-2000337
19 SNP_A-2000342
20 SNP_A-4268173

## now a simple SQL query

 > dbGetQuery(con, paste("select chrom, physical_pos, allele_a, allele_b 
from featureSet where man_fsetid in ('", paste(ids[,1], collapse="','"), 
"');", sep = ""))
    chrom physical_pos allele_a allele_b
1      1      2224111        A        G
2      1      2319424        A        G
3      1      2926730        C        T
4      1      3084986        C        G
5      1      3155127        A        C
6      1      3695086        C        G
7      1      3710825        A        G
8      1      3753024        A        G
9      1      3753427        A        G
10     1      3756100        A        G
11     1      3756146        A        C
12     1      4240737        A        G
13     1      4243294        C        G
14     1      4243405        A        C
15     1      4243441        C        T
16     1      1145994        C        T
17     1      2543484        C        T
18     1      2941694        C        T
19     1      3292731        C        T
20     1      4276892        C        T

If you don't know any SQL, note that there are a mixture of " and ' in 
that paste statement, as we want to end up with a query that looks like 
this:

"select chrom, physical_pos, allele_a, allele_b from featureSet where 
man_fsetid in 
('SNP_A-2131660','SNP_A-1967418','SNP_A-1969580','SNP_A-4263484','SNP_A-1978185','SNP_A-4264431','SNP_A-1980898','SNP_A-1983139','SNP_A-4265735','SNP_A-1995832','SNP_A-1995893','SNP_A-1997689','SNP_A-1997709','SNP_A-1997896','SNP_A-1997922','SNP_A-2000230','SNP_A-2000332','SNP_A-2000337','SNP_A-2000342','SNP_A-4268173');"

Best,

Jim



>
> Chromosome     Position        Reference Base       Call
> chr19                2094894             A                        T
> chr19                2095300             G                        A
>
>
> Is it possible through bioconductor?  Thanks for your time.
>
> -Adrian
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 



More information about the Bioconductor mailing list