[BioC] hugene10sttranscriptclusterACCNUM has no mappings

Thu Sep 4 08:25:55 CEST 2014

Hi Marc,

Thanks for the clarification. I just stumbled over this as I read that 
newer chips often have transcript specific probes and since entrez gene 
ids do not reflect those probes I was kind of hoped that these 
accessions would allow me a more precise mapping (or at least the 
potential to then get other database IDs that match to the specific 
transcripts out of the accessions).
Learning that the accessions are not the way to go, I'm wondering 
whether there is any linkage to transcripts.

Best,

Thomas

On 09/04/2014 02:38 AM, Marc Carlson wrote:
> Hi.  Sorry for the delay I was not in the office for almost a week 
> (and I left the day before this question popped up).  Part of the 
> reason for the confusion here is because the ACCNUM field is supposed 
> to represent the source accessions that were used when designing the 
> package.  In that sense ACCNUM is kind of an anachronism since I don't 
> think people really design chips this way very much anymore, so the 
> reason that bimap is even present is largely for backwards 
> compatibility more than anything else.  This is why the man page for 
> the ACCNUM mapping says this:
>
> "For chip packages such as this, the ACCNUM mapping comes directly 
> from the manufacturer.  This is different from other mappings which 
> are mapped onto the probes via an Entrez Gene identifier."
>
> Anyhow the code that builds the ChipDb package is proceeding under the 
> notion that you would only "have" those special ACCNUM values if those 
> were listed in your primary (fileName) set of keys.  That is, if the 
> probes are not really based on genbank accessions then you don't 
> really have any ACCNUM values anyways and that field should (in that 
> case) probably be left out entirely.
>
> So if you don't have legitimate ACCNUM values (that is you are not 
> dealing with an old chip where these really are the primary initial 
> keys that everything was based off of), then I don't think you should 
> fake them into the package by including them 1st.  Because effectively 
> what you will be doing is to inadvertently resurrect old retired IDs 
> from the dead.  I mean yes you can extract them out like that with old 
> dead accession numbers: but I don't think it's best practice to do 
> that.  Those ids were presumably retired for a reason.
>
> I hope this helps to explain things better,
>
>
>  Marc
>
>
>
>
> On 08/29/2014 07:15 AM, James W. MacDonald wrote:
>> Hi Thomas,
>>
>> I built that package, and as you note, there are no accession 
>> numbers. But maybe that is because I misunderstand something, so I am 
>> directly including Marc Carlson in this conversation.
>>
>> Since the annotation packages are Gene ID-centric, I create two 
>> files, one with probeid->GeneID, and one with 
>> probeid->GeneBank/RefSeq ID. I then use the first file as the primary 
>> annotation file, and the second as the 'otherSrc' file. If I then run 
>> makeDBPackage(), I get this output:
>>
>> baseMapType is eg
>> Prepending Metadata
>> Creating Genes table
>> Appending Probes
>> Found 0 Probe Accessions
>> Appending Gene Info
>> Found 19962 Gene Names
>> Found 19962 Gene Symbols
>> <snip>
>>
>> But if I then reverse the source files, using the second file as the 
>> primary annotation file, and the GeneID file as the 'otherSrc' file, 
>> I get:
>>
>> baseMapType is gb or gbNRef
>> Prepending Metadata
>> Creating Genes table
>> Appending Probes
>> Found 21941 Probe Accessions
>> Appending Gene Info
>> Found 20195 Gene Names
>> Found 20195 Gene Symbols
>> <snip>
>>
>> From my understanding of the SQLForge vignette, I should be able to 
>> use either ordering, and get identical results, but obviously this is 
>> not the case. Marc, can you shed some light on this? Evidently I 
>> should re-make the packages using gbNRef rather than eg as the 
>> baseMapType.
>>
>> Best,
>>
>> Jim
>>
>>
>>
>>
>> On Fri, Aug 29, 2014 at 4:30 AM, Thomas Pfau <thomas.pfau at uni.lu 
>> <mailto:thomas.pfau at uni.lu>> wrote:
>>
>>     Hello,
>>
>>     I just tried to get a probe to accession matching the above
>>     annotation database. In particular it does not yield any mappings
>>     for accessions. (i.e.
>>     x <- hugene10sttranscriptclusterACCNUM
>>     mapped_probes <- mappedkeys(x)
>>     yields an empty mapped_probes list.
>>
>>
>>     I'm Running R 3.1.1 on ubuntu.
>>     The loaded packages are:
>>
>>      [1] oligo_1.28.2 Biostrings_2.32.1 XVector_0.4.0
>>      [4] IRanges_1.22.10 oligoClasses_1.26.0
>>     hugene10sttranscriptcluster.db_8.1.0
>>      [7] org.Hs.eg.db_2.14.0 RSQLite_0.11.4 DBI_0.2-7
>>     [10] AnnotationDbi_1.26.0 GenomeInfoDb_1.0.2 Biobase_2.24.0
>>     [13] BiocGenerics_0.10.0 BiocInstaller_1.14.2
>>
>>     and capture.output(hugene10sttranscriptcluster()) yields:
>>      [1] "Quality control information for hugene10sttranscriptcluster:"
>>      [2] ""
>>      [3] ""
>>      [4] "This package has the following mappings:"
>>      [5] ""
>>      [6] "hugene10sttranscriptclusterACCNUM has 0 mapped keys (of
>>     33297 keys)"
>>      [7] "hugene10sttranscriptclusterALIAS2PROBE has 60778 mapped
>>     keys (of 103510 keys)"
>>      [8] "hugene10sttranscriptclusterCHR has 19962 mapped keys (of
>>     33297 keys)"
>>      [9] "hugene10sttranscriptclusterCHRLENGTHS has 93 mapped keys
>>     (of 93 keys)"
>>     [10] "hugene10sttranscriptclusterCHRLOC has 19424 mapped keys (of
>>     33297 keys)"
>>     [11] "hugene10sttranscriptclusterCHRLOCEND has 19424 mapped keys
>>     (of 33297 keys)"
>>     [12] "hugene10sttranscriptclusterENSEMBL has 19416 mapped keys
>>     (of 33297 keys)"
>>     [13] "hugene10sttranscriptclusterENSEMBL2PROBE has 20590 mapped
>>     keys (of 28046 keys)"
>>     [14] "hugene10sttranscriptclusterENTREZID has 19962 mapped keys
>>     (of 33297 keys)"
>>     [15] "hugene10sttranscriptclusterENZYME has 2201 mapped keys (of
>>     33297 keys)"
>>     [16] "hugene10sttranscriptclusterENZYME2PROBE has 958 mapped keys
>>     (of 975 keys)"
>>     [17] "hugene10sttranscriptclusterGENENAME has 19962 mapped keys
>>     (of 33297 keys)"
>>     [18] "hugene10sttranscriptclusterGO has 17412 mapped keys (of
>>     33297 keys)"
>>     [19] "hugene10sttranscriptclusterGO2ALLPROBES has 17930 mapped
>>     keys (of 18078 keys)"
>>     [20] "hugene10sttranscriptclusterGO2PROBE has 13970 mapped keys
>>     (of 14134 keys)"
>>     [21] "hugene10sttranscriptclusterMAP has 19832 mapped keys (of
>>     33297 keys)"
>>     [22] "hugene10sttranscriptclusterOMIM has 13778 mapped keys (of
>>     33297 keys)"
>>     [23] "hugene10sttranscriptclusterPATH has 5768 mapped keys (of
>>     33297 keys)"
>>     [24] "hugene10sttranscriptclusterPATH2PROBE has 229 mapped keys
>>     (of 229 keys)"
>>     [25] "hugene10sttranscriptclusterPFAM has 18146 mapped keys (of
>>     33297 keys)"
>>     [26] "hugene10sttranscriptclusterPMID has 19726 mapped keys (of
>>     33297 keys)"
>>     [27] "hugene10sttranscriptclusterPMID2PROBE has 396421 mapped
>>     keys (of 412133 keys)"
>>     [28] "hugene10sttranscriptclusterPROSITE has 18146 mapped keys
>>     (of 33297 keys)"
>>     [29] "hugene10sttranscriptclusterREFSEQ has 19873 mapped keys (of
>>     33297 keys)"
>>     [30] "hugene10sttranscriptclusterSYMBOL has 19962 mapped keys (of
>>     33297 keys)"
>>     [31] "hugene10sttranscriptclusterUNIGENE has 19578 mapped keys
>>     (of 33297 keys)"
>>     [32] "hugene10sttranscriptclusterUNIPROT has 18193 mapped keys
>>     (of 33297 keys)"
>>     [33] ""
>>     [34] ""
>>     [35] "Additional Information about this package:"
>>     [36] ""
>>     [37] "DB schema: HUMANCHIP_DB"
>>     [38] "DB schema version: 2.1"
>>     [39] "Organism: Homo sapiens"
>>     [40] "Date for NCBI data: 2014-Mar13"
>>     [41] "Date for GO data: 20140308"
>>     [42] "Date for KEGG data: 2011-Mar15"
>>     [43] "Date for Golden Path data: 2010-Mar22"
>>     [44] "Date for Ensembl data: 2014-Feb26"
>>
>>     It seems like something is broken there showing in line 4:
>>      [6] "hugene10sttranscriptclusterACCNUM has 0 mapped keys (of
>>     33297 keys)"
>>
>>     Any ideas on how to solve this? Or whether this is a bug on my
>>     side or on the package side?
>>
>>     Kind Regards
>>
>>     Thomas
>>
>>
>>     -- 
>>     Université du Luxembourg
>>     Faculté des Sciences, de la Technologie et de la Communication
>>     Campus Limpertsberg, BRB 2.13
>>     162a, avenue de la Faïencerie
>>     L-1511 Luxembourg
>>     Email: thomas.pfau at uni.lu <mailto:thomas.pfau at uni.lu>
>>
>>     _______________________________________________
>>     Bioconductor mailing list
>>     Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>>     https://stat.ethz.ch/mailman/listinfo/bioconductor
>>     Search the archives:
>>     http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>
>>
>> -- 
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>

-- 
Université du Luxembourg
Faculté des Sciences, de la Technologie et de la Communication
Campus Limpertsberg, BRB 2.13
162a, avenue de la Faïencerie
L-1511 Luxembourg
Email: thomas.pfau at uni.lu

	[[alternative HTML version deleted]]