[BioC] pd.hugene.2.0.st missing normgene->exon mappings

James W. MacDonald jmacdon at uw.edu
Tue Jul 9 19:13:58 CEST 2013


Hi Mark,

Thanks for the heads-up. We already knew that Affy messed up the 
transcript and probeset annotation files (and had them fixed), but 
didn't think I needed to check the others. Famous last words, no?

 > x <- readPgf("HuGene-2_0-st.pgf")
 > table(x$probesetType)

              control->affx   control->affx->bac_spike
                         18                         18
  control->affx->ercc_spike control->affx->polya_spike
                         92                         39
  control->bgp->antigenomic                       main
                         23                     349012
           normgene->intron                   reporter
                       3575                         82

 > y <- read.csv("HuGene-2_0-st-v1.na33.2.hg19.transcript.csv", 
comment.char = "#", stringsAsFactors=FALSE, header = TRUE)
 > table(y$category)

              control->affx   control->affx->bac_spike
                         18                         18
  control->affx->ercc-spike control->affx->polya_spike
                         92                         39
  control->bgp->antigenomic                       main
                         23                      44629
             normgene->exon           normgene->intron
                       1626                       3575
                   reporter                     rescue
                         82                       3515

I'll ping Affymetrix and see what they have to say.

Best,

Jim



On 7/9/2013 3:29 AM, Mark Cowley wrote:
> Dear Benilton, James&  Bioconductors,
> Thanks for providing the platform design packages for hugene/mogene/ragene 1.0/1.1/2.0/2.1 arrays.
>
> I use SQL to query these packages&  ultimately retain only 'main' probes in my analysis. This works well for 1.0 and 1.1 packages, but nor for 2.0 and 2.1 ST arrays. For 2.0 and 2.1 arrays, the normgene->exon control probes are misclassified as 'main' probes.
>
> evidence: the HuGene-2_0-st-v1.na33.2.hg19.transcript.csvNetAffx csv files lists 1626 normgene->exon probes, however the pg.hugene.2.0.st package lists 0, and assigns these 1626 probes to the 'main' category:
>
> # probe types:
> library(pd.hugene.2.0.st)
> conn<- db(pd.hugene.2.0.st)
> dbGetQuery(conn,"SELECT * from type_dict")
>     type                   type_id
> 1     1                      main
> 2     2             control->affx
> 3     3             control->chip
> 4     4 control->bgp->antigenomic
> 5     5     control->bgp->genomic
> 6     6            normgene->exon
> 7     7          normgene->intron
> 8     8  rescue->FLmRNA->unmapped
> 9     9  control->affx->bac_spike
> 10   10            oligo_spike_in
> 11   11           r1_bac_spike_at
>
> # probe counts for each of the probe categories:
> dbGetQuery(conn,"SELECT type, count(*) from featureSet GROUP BY type")
>    type count(*)
> 1   NA     3728
> 2    1   345497
> 3    2       18
> 4    4       23
> 5    7     3575
> 6    9       18
>
> NB: no type 6 probes.
> I've tested all 12 ho/mo/ra gene 1.0,1.1,2.0,2.1 ST packages, and see this issue for all 2.0 and 2.1 arrays (see below)
>
> Can these mappings please be updated?
>
> PS, there's a bunch of probes with type = NA in the database. I haven't investigated these in any detail.
>
> cheers,
> Mark
> -----------------------------------------------------
> Mark Cowley, PhD
>
> Genome Informatics Division&  the Centre for Clinical Genomics
> The Kinghorn Cancer Centre, Garvan Institute of Medical Research, Sydney, Australia
> -----------------------------------------------------
>
> All 12 packages below:
> pd.packages<- c(
>    "pd.hugene.1.0.st.v1", "pd.hugene.1.1.st.v1", "pd.hugene.2.0.st", "pd.hugene.2.1.st",
>    "pd.mogene.1.0.st.v1", "pd.mogene.1.1.st.v1", "pd.mogene.2.0.st", "pd.mogene.2.1.st",
>    "pd.ragene.1.0.st.v1", "pd.ragene.1.1.st.v1", "pd.ragene.2.0.st", "pd.ragene.2.1.st"
> )
>
> a<- b<- list()
> for(pd.pkg.name in pd.packages) {
>    try({
>      require(pd.pkg.name, character.only=TRUE) || stop("Can't load the pd.package")
>      conn<- db(get(pd.pkg.name))
>      a[[pd.pkg.name]]<- dbGetQuery(conn,"SELECT type, count(*) from featureSet GROUP BY type")
>      b[[pd.pkg.name]]<- dbGetQuery(conn,"SELECT fsetid from featureSet WHERE type = 6")[,1]
>    })
> }
> dbGetQuery(conn,"SELECT * from type_dict")
>
>> a
> $pd.hugene.1.0.st.v1
>    type count(*)
> 1   NA      227
> 2    1   253002
> 3    2       57
> 4    4       45
> 5    6     1195
> 6    7     2904
>
> $pd.hugene.1.1.st.v1
>    type count(*)
> 1   NA      227
> 2    1   253002
> 3    2       57
> 4    4       45
> 5    6     1195
> 6    7     2904
>
> $pd.hugene.2.0.st
>    type count(*)
> 1   NA     3728
> 2    1   345497
> 3    2       18
> 4    4       23
> 5    7     3575
> 6    9       18
>
> $pd.hugene.2.1.st
>    type count(*)
> 1   NA     3728
> 2    1   345497
> 3    2       18
> 4    4       23
> 5    7     3575
> 6    9       18
>
> $pd.mogene.1.0.st.v1
>    type count(*)
> 1   NA       86
> 2    1   234878
> 3    2       21
> 4    4       45
> 5    6     1324
> 6    7     5222
>
> $pd.mogene.1.1.st.v1
>    type count(*)
> 1   NA       86
> 2    1   234878
> 3    2       21
> 4    4       45
> 5    6     1324
> 6    7     5222
>
> $pd.mogene.2.0.st
>    type count(*)
> 1   NA      810
> 2    1   263551
> 3    2       18
> 4    4       23
> 5    7     5331
> 6    9       18
>
> $pd.mogene.2.1.st
>    type count(*)
> 1   NA      810
> 2    1   263551
> 3    2       18
> 4    4       23
> 5    7     5331
> 6    9       18
>
> $pd.ragene.1.0.st.v1
>    type count(*)
> 1   NA      254
> 2    1   211195
> 3    2       21
> 4    4       45
> 5    6      399
> 6    7     1153
>
> $pd.ragene.1.1.st.v1
>    type count(*)
> 1   NA      254
> 2    1   211195
> 3    2       21
> 4    4       45
> 5    6      399
> 6    7     1153
>
> $pd.ragene.2.0.st
>    type count(*)
> 1   NA     1071
> 2    1   214018
> 3    2       18
> 4    4       23
> 5    7     5083
> 6    9       18
>
> $pd.ragene.2.1.st
>    type count(*)
> 1   NA     1071
> 2    1   214018
> 3    2       18
> 4    4       23
> 5    7     5083
> 6    9       18
>
>> sapply(b,length)
> pd.hugene.1.0.st.v1 pd.hugene.1.1.st.v1    pd.hugene.2.0.st    pd.hugene.2.1.st
>                 1195                1195                   0                   0
> pd.mogene.1.0.st.v1 pd.mogene.1.1.st.v1    pd.mogene.2.0.st    pd.mogene.2.1.st
>                 1324                1324                   0                   0
> pd.ragene.1.0.st.v1 pd.ragene.1.1.st.v1    pd.ragene.2.0.st    pd.ragene.2.1.st
>                  399                 399                   0                   0
>
>> sessionInfo()
> R version 3.0.0 (2013-04-03)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8
>   [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8
>   [7] LC_PAPER=C                 LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
>   [1] pd.ragene.2.1.st_2.12.1   pd.ragene.2.0.st_2.12.0
>   [3] pd.ragene.1.1.st.v1_3.8.0 pd.ragene.1.0.st.v1_3.8.0
>   [5] pd.mogene.2.1.st_2.12.1   pd.mogene.2.0.st_2.12.0
>   [7] pd.mogene.1.1.st.v1_3.8.0 pd.mogene.1.0.st.v1_3.8.0
>   [9] pd.hugene.2.1.st_3.8.0    pd.hugene.1.1.st.v1_3.8.0
> [11] pd.hugene.1.0.st.v1_3.8.0 pd.hugene.2.0.st_3.8.0
> [13] oligo_1.24.0              Biobase_2.20.0
> [15] oligoClasses_1.22.0       BiocGenerics_0.6.0
> [17] RSQLite_0.11.4            DBI_0.2-7
> [19] BiocInstaller_1.10.2
>
> loaded via a namespace (and not attached):
>   [1] affxparser_1.32.1     affyio_1.28.0         Biostrings_2.28.0
>   [4] bit_1.1-10            codetools_0.2-8       ff_2.2-11
>   [7] foreach_1.4.1         GenomicRanges_1.12.3  IRanges_1.18.1
> [10] iterators_1.0.6       preprocessCore_1.22.0 splines_3.0.0
> [13] stats4_3.0.0          tools_3.0.0           zlibbioc_1.6.0
>
>
>
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list