[BioC] Fwd: Annotation discrepancy

James W. MacDonald jmacdon at uw.edu
Fri Dec 20 21:02:10 CET 2013


Hi Eric,

Good point. So let's look, shall we?

 > library(hthgu133pluspmprobe)
 > library(hthgu133pluspmcdf)
 > ht <- as.data.frame(hthgu133pluspmprobe)
 > prb.lst <- tapply(1:nrow(ht), ht$Probe.Set.Name, function(x) ht[x,2:3])
 > cdf.lst <- mget(ls(hthgu133pluspmcdf), hthgu133pluspmcdf)
 > names(prb.lst) <- tolower(names(prb.lst)) ## because stupid Affy 
can't keep their names consistent
 > names(cdf.lst) <- tolower(names(cdf.lst))
 > all.equal(names(prb.lst), names(cdf.lst))
[1] TRUE
 > prb.lst.len <- sapply(prb.lst, nrow)
 > cdf.lst.len <- sapply(cdf.lst, nrow)
 > all.equal(prb.lst.len, cdf.lst.len)
[1] "Mean relative difference: 427.25"
 > length(which(prb.lst.len != cdf.lst.len))
[1] 40
 > cbind(prb.lst.len, cdf.lst.len)[prb.lst.len != cdf.lst.len,]
                         prb.lst.len cdf.lst.len
affx-nonspecificgc10_at           1         952
affx-nonspecificgc11_at           1         960
affx-nonspecificgc12_at           1         973
affx-nonspecificgc13_at           1         968
affx-nonspecificgc14_at           1         960
affx-nonspecificgc15_at           1         949
affx-nonspecificgc16_at           1         963
affx-nonspecificgc17_at           1         942
affx-nonspecificgc18_at           1         912
affx-nonspecificgc19_at           1         849
affx-nonspecificgc20_at           1         813
affx-nonspecificgc21_at           1         697
affx-nonspecificgc22_at           1         585
affx-nonspecificgc23_at           1         407
affx-nonspecificgc24_at           1         268
affx-nonspecificgc25_at           1           9
affx-nonspecificgc3_at            1          25
affx-nonspecificgc4_at            1         322
affx-nonspecificgc5_at            1         703
affx-nonspecificgc6_at            1         873
affx-nonspecificgc7_at            1         914
affx-nonspecificgc8_at            1         940
affx-nonspecificgc9_at            1         959
affx-r2-taga_at                   1          11
affx-r2-tagb_at                   1          11
affx-r2-tagc_at                   1          11
affx-r2-tagd_at                   1          11
affx-r2-tage_at                   1          11
affx-r2-tagf_at                   1          11
affx-r2-tagg_at                   1          11
affx-r2-tagh_at                   1          11
affx-r2-tagin-3_at                1          11
affx-r2-tagin-5_at                1          11
affx-r2-tagin-m_at                1          11
affx-r2-tagj-3_at                 1          11
affx-r2-tagj-5_at                 1          11
affx-r2-tago-3_at                 1          11
affx-r2-tago-5_at                 1          11
affx-r2-tagq-3_at                 1          11
affx-r2-tagq-5_at                 1          11

So there you go - there's a bunch of control probes of different sorts 
for which Affy gives us a single sequence, but for which there appear to 
be lots of probes. Netaffx seems unwilling to say much about the 
nonspecificgc probes, but as an example, it does say there are 11 
individual probe sequences for e.g., affx-r2-tagin-3_at.

Best,

Jim



On 12/20/2013 2:15 PM, Eric Zollars wrote:
> Jim-
> Thanks for the response.
>
> However, in the hgu133plus2probe package there is complete agreement
> between what is in the probe package and what the Affybatch object reports
> (604258 sequences).
>
> Why would that be so?
>
>
> On Fri, Dec 20, 2013 at 2:05 PM, James W. MacDonald <jmacdon at uw.edu> wrote:
>
>> Hi Eric,
>>
>> Most if not all of those probes are the oligo-dT probes that surround the
>> chip (and I believe there are some in the middle as well). These probes are
>> used by the scanner as 'landing lights' to allow the scanner to accurately
>> align to the array prior to doing the scan.
>>
>> The scanner does collect data from these probes, which ends up in the cel
>> file, but they are then ignored when the array is processed further.
>>
>> Best,
>>
>> Jim
>>
>>
>>
>> On 12/20/2013 1:28 PM, Eric Zollars wrote:
>>
>>> All-
>>>
>>> I have been attempting to compare sequences on the HGU133 Plus 2.0 chip to
>>> the HT HGU 133+ PM.
>>> I am doing this to compare values of vectors in frma.
>>>
>>> The HT chip is a subset of HGU133 Plus 2.0 with mismatch probes removes
>>> and
>>> some probesets reduced in size.
>>>
>>> Looking at the probe package:
>>>
>>> hthgu133pluspmprobe$sequence: 519370
>>>
>>> However, when looking at an Affybatch object made from HT CEL files:
>>> Taking an Affybatch object: 'dat'
>>>
>>> Index <- pmindex(dat)
>>> tv = unlist(Index)
>>> length(tv)   #536460
>>>
>>> It appears that the Affybatch reports that there are 536460 sequences and
>>> the hthgu133pluspmprobe package is reporting only 519370.
>>>
>>> What is the difference?  It is possible to find the information on the
>>> 17090 sequences not in the hthgu133pluspmprobe package?
>>>
>>> Thanks for any information or direction.
>>>
>>> Eric Zollars
>>>
>>> Session info below: bioconductor 2.13, R 3.0.2
>>>
>>>   sessionInfo()
>>> R version 3.0.2 (2013-09-25)
>>> Platform: i386-w64-mingw32/i386 (32-bit)
>>>
>>> locale:
>>> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
>>> States.1252
>>> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>>>
>>> [5] LC_TIME=English_United States.1252
>>>
>>> attached base packages:
>>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>>> base
>>>
>>> other attached packages:
>>> [1] affy_1.40.0                hthgu133pluspmcdf_2.13.0
>>> hgu133plus2frmavecs_1.3.0
>>> [4] hgu133plus2probe_2.13.0    hthgu133pluspmprobe_2.13.0
>>> AnnotationDbi_1.24.0
>>> [7] Biobase_2.22.0             BiocGenerics_0.8.0
>>> BiocInstaller_1.12.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] affyio_1.30.0         DBI_0.2-7             IRanges_1.20.6
>>> [4] preprocessCore_1.24.0 RSQLite_0.11.4        stats4_3.0.2
>>> [7] tools_3.0.2           zlibbioc_1.8.0
>>>
>>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
>>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list