[BioC] Affy's 500K SNP arrays - retrieval of probe info

Wed May 30 08:56:55 CEST 2007

Hi Ben,

I did not realise that there are SNPs that are not represented by a central
'position 0' probe.

Thanks for this clarification!
An

-----Original Message-----
From: Benilton Carvalho [mailto:bcarvalh at jhsph.edu]
Sent: Tuesday, 29 May 2007 17:50
To: De Bondt, An-7114 [PRDBE]
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] Affy's 500K SNP arrays - retrieval of probe info

Hi An,

There is no direct association between the number of SNPs and the  
number of probes whose offset is zero.

On the 250K designs, given a SNP, the number of probes across offsets  
is usually unbalanced.

Here a little piece of (ugly) code to clarify what I mean:

ann = "pd.mapping250k.sty"
fields = "man_fsetid, pmfeature.strand, allele, offset"
tbls = "pmfeature, sequence, featureSet"
conditions = "pmfeature.fid = sequence.fid AND  
featureSet.fsetid=pmfeature.fsetid"
sql = paste("SELECT", fields, "FROM", tbls, "WHERE", conditions)
tmp = dbGetQuery(db(get(ann)), sql)

table(tmp[["man_fsetid"]], tmp[["offset"]])[1:10,]

Hope this helps,

b

On May 29, 2007, at 6:54 AM, De Bondt, An-7114 [PRDBE] wrote:

>
> Exactly, Ben, thanks a lot !
>
> Applying this on the Sty based feature set (6553600 rows) results in:
> 3 vectors, each of length 3201544 (the other 3352056 are  
> corresponding to
> MM)
> and the centralSnps vector of length 454224.
>
> What I do not understand yet:
> The number of rows after snprma() is 238304 for Sty. How is that  
> number
> related to the length of centralSnps?
> In advance, I expected that the length of centralSnps would have  
> been 4
> times the number of rows after snprma:
>      one central snp for alleleA on the sense strand
>      one central snp for alleleB on the sense strand
>      one central snp for alleleA on the antisense strand
>      one central snp for alleleB on the antisense strand
>
> Kind regards,
> An
>
>
> -----Original Message-----
> From: Benilton Carvalho [mailto:bcarvalh at jhsph.edu]
> Sent: Friday, 25 May 2007 14:33
> To: De Bondt, An-7114 [PRDBE]
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] Affy's 500K SNP arrays - retrieval of probe info
>
>
> Hi An,
>
> I'm assuming you want the offset and GC content for the PM probes, ok?
>
> Say your probe-level data (SnpFeature object) is called "rawData".
>
> theOffset <- pmPosition(get(annotation(rawData)))
> theSequences <- pmSequence(get(annotation(rawDataa)))
>
> centralSnps <- which(theOffset == 0)
> percentGC <- sapply(gregexpr("G|C", theSequences), length)/25
>
> b
>
> On May 25, 2007, at 8:00 AM, De Bondt, An-7114 [PRDBE] wrote:
>
>> Dear,
>>
>>> From the raw probe level data, we would like to select only those
>>> of the
>> central SNP probe (position 0, with the SNP position exactly in the
>> middle)
>> from the sense as well as from the antisense strand.  How can we do
>> this?
>>
>> We know we can get the GC content from that central probe based on  
>> the
>> 'Mapping250K_Nsp snp info.txt' file.  How can we get %GC for each
>> of the
>> other probes as well? Is there a cdf for the Nsp and Sty arrays? Or
>> can we
>> get this info out of the pd.mapping250k.nsp/pd.mapping250k.sty? Or
>> is there
>> another way to get that info?
>>
>> Thanks in advance for your help!
>>
>> Regards,
>> An
>
>
> --
> Benilton Carvalho
> PhD Candidate
> Department of Biostatistics
> Bloomberg School of Public Health
> Johns Hopkins University
> bcarvalh at jhsph.edu
>