[BioC] Transcript clusters missing from hugene20sttranscriptcluster.db

James W. MacDonald jmacdon at uw.edu
Wed Jun 4 15:53:54 CEST 2014


Hi Adam,

On 6/3/2014 6:07 PM, Cornwell, Adam wrote:
> Hello,
>
> I've been working with hugene20sttranscriptcluster.db_2.14.0 (most recent release version) for the last couple of days, and noticed that some of our usual marker genes appear to not be present in the annotation package. These genes are present in current and previous versions of the Affymetrix probe -> gene mappings from NETAFFX.
> For example, transcript cluster 16966809 should correspond to gene symbol PDGFRA and Entrez ID 5156 (which is included in the NA34 annotation release for the platform) but any(mappedkeys(hugene20sttranscriptclusterSYMBOL) == "16966809") turns up FALSE. Picking a random transcript cluster, 16748695 (PDE6H), turns up TRUE and will return the symbol. I'm not sure if there are other genes missing as well, since I happened to stumble across this one.
>
> For now I can try to build an annotation database from the affy annotation. Am I missing something or can someone else confirm that things are missing?
>
> Quick copy-paste example:
> library(hugene20sttranscriptcluster.db, annotate)
> any(mappedkeys(hugene20sttranscriptclusterSYMBOL) == "16966809")

What you are missing is that this probeset maps to two symbols, and is 
thus masked in the conventional get() and bimap interfaces.

 > get("16966809", hugene20sttranscriptclusterSYMBOL)
[1] NA
 > any(mappedkeys(hugene20sttranscriptclusterSYMBOL) == "16966809")
[1] FALSE
 > z <- toggleProbes(hugene20sttranscriptclusterSYMBOL, "all")
 > get("16966809", z)
[1] "PDGFRA" "FIP1L1"
 > any(mappedkeys(z) == "16966809")
[1] TRUE


These older methods have been supplanted by the select() method, which 
you should use instead:

 > select(hugene20sttranscriptcluster.db, "16966809", "SYMBOL")
    PROBEID SYMBOL
1 16966809 PDGFRA
2 16966809 FIP1L1
Warning message:
In .generateExtraRows(tab, keys, jointype) :
   'select' resulted in 1:many mapping between keys and return rows


Best,

Jim


>
>
>> sessionInfo()
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] splines   parallel  stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] hugene20sttranscriptcluster.db_2.14.0 OrderedList_1.36.0                    twilight_1.40.0                       BiocInstaller_1.14.2
>   [5] doParallel_1.0.8                      iterators_1.0.7                       limma_3.20.4                          gplots_2.13.0
>   [9] xlsx_0.5.5                            xlsxjars_0.6.0                        rJava_0.9-6                           annotate_1.42.0
> [13] SCAN.UPC_2.6.0                        sva_3.10.0                            mgcv_1.7-29                           nlme_3.1-117
> [17] corpcor_1.6.6                         foreach_1.4.2                         affyio_1.32.0                         affy_1.42.2
> [21] GEOquery_2.30.0                       oligo_1.28.2                          Biostrings_2.32.0                     XVector_0.4.0
> [25] IRanges_1.22.7                        oligoClasses_1.26.0                   org.Hs.eg.db_2.14.0                   RSQLite_0.11.4
> [29] DBI_0.2-7                             AnnotationDbi_1.26.0                  GenomeInfoDb_1.0.2                    Biobase_2.24.0
> [33] BiocGenerics_0.10.0
>
> loaded via a namespace (and not attached):
> [1] affxparser_1.36.0     bit_1.1-12            bitops_1.0-6          caTools_1.17          codetools_0.2-8       ff_2.2-13             gdata_2.13.3
>   [8] GenomicRanges_1.16.3  grid_3.1.0            gtools_3.4.0          KernSmooth_2.23-12    lattice_0.20-29       MASS_7.3-31           Matrix_1.1-3
> [15] preprocessCore_1.26.1 RCurl_1.95-4.1        stats4_3.1.0          tools_3.1.0           XML_3.98-1.1          xtable_1.7-3          zlibbioc_1.10.0
>
>
> Adam Cornwell
> Programmer/Analyst
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list