[BioC] hugene10sttranscriptclusterACCNUM has no mappings

Marc Carlson mcarlson at fhcrc.org
Thu Sep 4 02:38:50 CEST 2014


Hi.  Sorry for the delay I was not in the office for almost a week (and 
I left the day before this question popped up).  Part of the reason for 
the confusion here is because the ACCNUM field is supposed to represent 
the source accessions that were used when designing the package.  In 
that sense ACCNUM is kind of an anachronism since I don't think people 
really design chips this way very much anymore, so the reason that bimap 
is even present is largely for backwards compatibility more than 
anything else.  This is why the man page for the ACCNUM mapping says this:

"For chip packages such as this, the ACCNUM mapping comes directly from 
the manufacturer.  This is different from other mappings which are 
mapped onto the probes via an Entrez Gene identifier."

Anyhow the code that builds the ChipDb package is proceeding under the 
notion that you would only "have" those special ACCNUM values if those 
were listed in your primary (fileName) set of keys.  That is, if the 
probes are not really based on genbank accessions then you don't really 
have any ACCNUM values anyways and that field should (in that case) 
probably be left out entirely.

So if you don't have legitimate ACCNUM values (that is you are not 
dealing with an old chip where these really are the primary initial keys 
that everything was based off of), then I don't think you should fake 
them into the package by including them 1st. Because effectively what 
you will be doing is to inadvertently resurrect old retired IDs from the 
dead.  I mean yes you can extract them out like that with old dead 
accession numbers: but I don't think it's best practice to do that.  
Those ids were presumably retired for a reason.

I hope this helps to explain things better,


  Marc




On 08/29/2014 07:15 AM, James W. MacDonald wrote:
> Hi Thomas,
>
> I built that package, and as you note, there are no accession numbers. 
> But maybe that is because I misunderstand something, so I am directly 
> including Marc Carlson in this conversation.
>
> Since the annotation packages are Gene ID-centric, I create two files, 
> one with probeid->GeneID, and one with probeid->GeneBank/RefSeq ID. I 
> then use the first file as the primary annotation file, and the second 
> as the 'otherSrc' file. If I then run makeDBPackage(), I get this output:
>
> baseMapType is eg
> Prepending Metadata
> Creating Genes table
> Appending Probes
> Found 0 Probe Accessions
> Appending Gene Info
> Found 19962 Gene Names
> Found 19962 Gene Symbols
> <snip>
>
> But if I then reverse the source files, using the second file as the 
> primary annotation file, and the GeneID file as the 'otherSrc' file, I 
> get:
>
> baseMapType is gb or gbNRef
> Prepending Metadata
> Creating Genes table
> Appending Probes
> Found 21941 Probe Accessions
> Appending Gene Info
> Found 20195 Gene Names
> Found 20195 Gene Symbols
> <snip>
>
> From my understanding of the SQLForge vignette, I should be able to 
> use either ordering, and get identical results, but obviously this is 
> not the case. Marc, can you shed some light on this? Evidently I 
> should re-make the packages using gbNRef rather than eg as the 
> baseMapType.
>
> Best,
>
> Jim
>
>
>
>
> On Fri, Aug 29, 2014 at 4:30 AM, Thomas Pfau <thomas.pfau at uni.lu 
> <mailto:thomas.pfau at uni.lu>> wrote:
>
>     Hello,
>
>     I just tried to get a probe to accession matching the above
>     annotation database. In particular it does not yield any mappings
>     for accessions. (i.e.
>     x <- hugene10sttranscriptclusterACCNUM
>     mapped_probes <- mappedkeys(x)
>     yields an empty mapped_probes list.
>
>
>     I'm Running R 3.1.1 on ubuntu.
>     The loaded packages are:
>
>      [1] oligo_1.28.2 Biostrings_2.32.1 XVector_0.4.0
>      [4] IRanges_1.22.10 oligoClasses_1.26.0
>     hugene10sttranscriptcluster.db_8.1.0
>      [7] org.Hs.eg.db_2.14.0 RSQLite_0.11.4 DBI_0.2-7
>     [10] AnnotationDbi_1.26.0 GenomeInfoDb_1.0.2 Biobase_2.24.0
>     [13] BiocGenerics_0.10.0 BiocInstaller_1.14.2
>
>     and capture.output(hugene10sttranscriptcluster()) yields:
>      [1] "Quality control information for hugene10sttranscriptcluster:"
>      [2] ""
>      [3] ""
>      [4] "This package has the following mappings:"
>      [5] ""
>      [6] "hugene10sttranscriptclusterACCNUM has 0 mapped keys (of
>     33297 keys)"
>      [7] "hugene10sttranscriptclusterALIAS2PROBE has 60778 mapped keys
>     (of 103510 keys)"
>      [8] "hugene10sttranscriptclusterCHR has 19962 mapped keys (of
>     33297 keys)"
>      [9] "hugene10sttranscriptclusterCHRLENGTHS has 93 mapped keys (of
>     93 keys)"
>     [10] "hugene10sttranscriptclusterCHRLOC has 19424 mapped keys (of
>     33297 keys)"
>     [11] "hugene10sttranscriptclusterCHRLOCEND has 19424 mapped keys
>     (of 33297 keys)"
>     [12] "hugene10sttranscriptclusterENSEMBL has 19416 mapped keys (of
>     33297 keys)"
>     [13] "hugene10sttranscriptclusterENSEMBL2PROBE has 20590 mapped
>     keys (of 28046 keys)"
>     [14] "hugene10sttranscriptclusterENTREZID has 19962 mapped keys
>     (of 33297 keys)"
>     [15] "hugene10sttranscriptclusterENZYME has 2201 mapped keys (of
>     33297 keys)"
>     [16] "hugene10sttranscriptclusterENZYME2PROBE has 958 mapped keys
>     (of 975 keys)"
>     [17] "hugene10sttranscriptclusterGENENAME has 19962 mapped keys
>     (of 33297 keys)"
>     [18] "hugene10sttranscriptclusterGO has 17412 mapped keys (of
>     33297 keys)"
>     [19] "hugene10sttranscriptclusterGO2ALLPROBES has 17930 mapped
>     keys (of 18078 keys)"
>     [20] "hugene10sttranscriptclusterGO2PROBE has 13970 mapped keys
>     (of 14134 keys)"
>     [21] "hugene10sttranscriptclusterMAP has 19832 mapped keys (of
>     33297 keys)"
>     [22] "hugene10sttranscriptclusterOMIM has 13778 mapped keys (of
>     33297 keys)"
>     [23] "hugene10sttranscriptclusterPATH has 5768 mapped keys (of
>     33297 keys)"
>     [24] "hugene10sttranscriptclusterPATH2PROBE has 229 mapped keys
>     (of 229 keys)"
>     [25] "hugene10sttranscriptclusterPFAM has 18146 mapped keys (of
>     33297 keys)"
>     [26] "hugene10sttranscriptclusterPMID has 19726 mapped keys (of
>     33297 keys)"
>     [27] "hugene10sttranscriptclusterPMID2PROBE has 396421 mapped keys
>     (of 412133 keys)"
>     [28] "hugene10sttranscriptclusterPROSITE has 18146 mapped keys (of
>     33297 keys)"
>     [29] "hugene10sttranscriptclusterREFSEQ has 19873 mapped keys (of
>     33297 keys)"
>     [30] "hugene10sttranscriptclusterSYMBOL has 19962 mapped keys (of
>     33297 keys)"
>     [31] "hugene10sttranscriptclusterUNIGENE has 19578 mapped keys (of
>     33297 keys)"
>     [32] "hugene10sttranscriptclusterUNIPROT has 18193 mapped keys (of
>     33297 keys)"
>     [33] ""
>     [34] ""
>     [35] "Additional Information about this package:"
>     [36] ""
>     [37] "DB schema: HUMANCHIP_DB"
>     [38] "DB schema version: 2.1"
>     [39] "Organism: Homo sapiens"
>     [40] "Date for NCBI data: 2014-Mar13"
>     [41] "Date for GO data: 20140308"
>     [42] "Date for KEGG data: 2011-Mar15"
>     [43] "Date for Golden Path data: 2010-Mar22"
>     [44] "Date for Ensembl data: 2014-Feb26"
>
>     It seems like something is broken there showing in line 4:
>      [6] "hugene10sttranscriptclusterACCNUM has 0 mapped keys (of
>     33297 keys)"
>
>     Any ideas on how to solve this? Or whether this is a bug on my
>     side or on the package side?
>
>     Kind Regards
>
>     Thomas
>
>
>     -- 
>     Université du Luxembourg
>     Faculté des Sciences, de la Technologie et de la Communication
>     Campus Limpertsberg, BRB 2.13
>     162a, avenue de la Faïencerie
>     L-1511 Luxembourg
>     Email: thomas.pfau at uni.lu <mailto:thomas.pfau at uni.lu>
>
>     _______________________________________________
>     Bioconductor mailing list
>     Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>     https://stat.ethz.ch/mailman/listinfo/bioconductor
>     Search the archives:
>     http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
>
> -- 
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099


	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list