[BioC] hugene10sttranscriptclusterACCNUM has no mappings

Marc Carlson mcarlson at fhcrc.org
Thu Sep 4 19:04:39 CEST 2014


Hi Thomas,

You are correct that the current ChipDb packages (as generated by the 
makeDBPackage function) are not designed for transcript level 
specificity at all.  They are meant to be gene centric only.  A popular 
bioconductor object that works at the transcript level would be 
something like the TranscriptDb object (which has a different use 
case).  It's possible that it's finally time to think about making some 
transcript centric ChipDb style of objects.  And at 1st blush these 
might not even be all that tricky to make and use.  But before we go too 
far down that path I am curious about how many platforms could actually 
be able to take advantage of that (and how many probes on those 
platforms could even detect with that level of specificity)?

  Marc


On 09/03/2014 11:25 PM, Thomas Pfau wrote:
> Hi Marc,
>
> Thanks for the clarification. I just stumbled over this as I read that 
> newer chips often have transcript specific probes and since entrez 
> gene ids do not reflect those probes I was kind of hoped that these 
> accessions would allow me a more precise mapping (or at least the 
> potential to then get other database IDs that match to the specific 
> transcripts out of the accessions).
> Learning that the accessions are not the way to go, I'm wondering 
> whether there is any linkage to transcripts.
>
> Best,
>
> Thomas
>
> On 09/04/2014 02:38 AM, Marc Carlson wrote:
>> Hi.  Sorry for the delay I was not in the office for almost a week 
>> (and I left the day before this question popped up).  Part of the 
>> reason for the confusion here is because the ACCNUM field is supposed 
>> to represent the source accessions that were used when designing the 
>> package.  In that sense ACCNUM is kind of an anachronism since I 
>> don't think people really design chips this way very much anymore, so 
>> the reason that bimap is even present is largely for backwards 
>> compatibility more than anything else. This is why the man page for 
>> the ACCNUM mapping says this:
>>
>> "For chip packages such as this, the ACCNUM mapping comes directly 
>> from the manufacturer.  This is different from other mappings which 
>> are mapped onto the probes via an Entrez Gene identifier."
>>
>> Anyhow the code that builds the ChipDb package is proceeding under 
>> the notion that you would only "have" those special ACCNUM values if 
>> those were listed in your primary (fileName) set of keys.  That is, 
>> if the probes are not really based on genbank accessions then you 
>> don't really have any ACCNUM values anyways and that field should (in 
>> that case) probably be left out entirely.
>>
>> So if you don't have legitimate ACCNUM values (that is you are not 
>> dealing with an old chip where these really are the primary initial 
>> keys that everything was based off of), then I don't think you should 
>> fake them into the package by including them 1st.  Because 
>> effectively what you will be doing is to inadvertently resurrect old 
>> retired IDs from the dead.  I mean yes you can extract them out like 
>> that with old dead accession numbers: but I don't think it's best 
>> practice to do that.  Those ids were presumably retired for a reason.
>>
>> I hope this helps to explain things better,
>>
>>
>>  Marc
>>
>>
>>
>>
>> On 08/29/2014 07:15 AM, James W. MacDonald wrote:
>>> Hi Thomas,
>>>
>>> I built that package, and as you note, there are no accession 
>>> numbers. But maybe that is because I misunderstand something, so I 
>>> am directly including Marc Carlson in this conversation.
>>>
>>> Since the annotation packages are Gene ID-centric, I create two 
>>> files, one with probeid->GeneID, and one with 
>>> probeid->GeneBank/RefSeq ID. I then use the first file as the 
>>> primary annotation file, and the second as the 'otherSrc' file. If I 
>>> then run makeDBPackage(), I get this output:
>>>
>>> baseMapType is eg
>>> Prepending Metadata
>>> Creating Genes table
>>> Appending Probes
>>> Found 0 Probe Accessions
>>> Appending Gene Info
>>> Found 19962 Gene Names
>>> Found 19962 Gene Symbols
>>> <snip>
>>>
>>> But if I then reverse the source files, using the second file as the 
>>> primary annotation file, and the GeneID file as the 'otherSrc' file, 
>>> I get:
>>>
>>> baseMapType is gb or gbNRef
>>> Prepending Metadata
>>> Creating Genes table
>>> Appending Probes
>>> Found 21941 Probe Accessions
>>> Appending Gene Info
>>> Found 20195 Gene Names
>>> Found 20195 Gene Symbols
>>> <snip>
>>>
>>> From my understanding of the SQLForge vignette, I should be able to 
>>> use either ordering, and get identical results, but obviously this 
>>> is not the case. Marc, can you shed some light on this? Evidently I 
>>> should re-make the packages using gbNRef rather than eg as the 
>>> baseMapType.
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>
>>>
>>> On Fri, Aug 29, 2014 at 4:30 AM, Thomas Pfau <thomas.pfau at uni.lu 
>>> <mailto:thomas.pfau at uni.lu>> wrote:
>>>
>>>     Hello,
>>>
>>>     I just tried to get a probe to accession matching the above
>>>     annotation database. In particular it does not yield any
>>>     mappings for accessions. (i.e.
>>>     x <- hugene10sttranscriptclusterACCNUM
>>>     mapped_probes <- mappedkeys(x)
>>>     yields an empty mapped_probes list.
>>>
>>>
>>>     I'm Running R 3.1.1 on ubuntu.
>>>     The loaded packages are:
>>>
>>>      [1] oligo_1.28.2 Biostrings_2.32.1 XVector_0.4.0
>>>      [4] IRanges_1.22.10 oligoClasses_1.26.0
>>>     hugene10sttranscriptcluster.db_8.1.0
>>>      [7] org.Hs.eg.db_2.14.0 RSQLite_0.11.4 DBI_0.2-7
>>>     [10] AnnotationDbi_1.26.0 GenomeInfoDb_1.0.2 Biobase_2.24.0
>>>     [13] BiocGenerics_0.10.0 BiocInstaller_1.14.2
>>>
>>>     and capture.output(hugene10sttranscriptcluster()) yields:
>>>      [1] "Quality control information for hugene10sttranscriptcluster:"
>>>      [2] ""
>>>      [3] ""
>>>      [4] "This package has the following mappings:"
>>>      [5] ""
>>>      [6] "hugene10sttranscriptclusterACCNUM has 0 mapped keys (of
>>>     33297 keys)"
>>>      [7] "hugene10sttranscriptclusterALIAS2PROBE has 60778 mapped
>>>     keys (of 103510 keys)"
>>>      [8] "hugene10sttranscriptclusterCHR has 19962 mapped keys (of
>>>     33297 keys)"
>>>      [9] "hugene10sttranscriptclusterCHRLENGTHS has 93 mapped keys
>>>     (of 93 keys)"
>>>     [10] "hugene10sttranscriptclusterCHRLOC has 19424 mapped keys
>>>     (of 33297 keys)"
>>>     [11] "hugene10sttranscriptclusterCHRLOCEND has 19424 mapped keys
>>>     (of 33297 keys)"
>>>     [12] "hugene10sttranscriptclusterENSEMBL has 19416 mapped keys
>>>     (of 33297 keys)"
>>>     [13] "hugene10sttranscriptclusterENSEMBL2PROBE has 20590 mapped
>>>     keys (of 28046 keys)"
>>>     [14] "hugene10sttranscriptclusterENTREZID has 19962 mapped keys
>>>     (of 33297 keys)"
>>>     [15] "hugene10sttranscriptclusterENZYME has 2201 mapped keys (of
>>>     33297 keys)"
>>>     [16] "hugene10sttranscriptclusterENZYME2PROBE has 958 mapped
>>>     keys (of 975 keys)"
>>>     [17] "hugene10sttranscriptclusterGENENAME has 19962 mapped keys
>>>     (of 33297 keys)"
>>>     [18] "hugene10sttranscriptclusterGO has 17412 mapped keys (of
>>>     33297 keys)"
>>>     [19] "hugene10sttranscriptclusterGO2ALLPROBES has 17930 mapped
>>>     keys (of 18078 keys)"
>>>     [20] "hugene10sttranscriptclusterGO2PROBE has 13970 mapped keys
>>>     (of 14134 keys)"
>>>     [21] "hugene10sttranscriptclusterMAP has 19832 mapped keys (of
>>>     33297 keys)"
>>>     [22] "hugene10sttranscriptclusterOMIM has 13778 mapped keys (of
>>>     33297 keys)"
>>>     [23] "hugene10sttranscriptclusterPATH has 5768 mapped keys (of
>>>     33297 keys)"
>>>     [24] "hugene10sttranscriptclusterPATH2PROBE has 229 mapped keys
>>>     (of 229 keys)"
>>>     [25] "hugene10sttranscriptclusterPFAM has 18146 mapped keys (of
>>>     33297 keys)"
>>>     [26] "hugene10sttranscriptclusterPMID has 19726 mapped keys (of
>>>     33297 keys)"
>>>     [27] "hugene10sttranscriptclusterPMID2PROBE has 396421 mapped
>>>     keys (of 412133 keys)"
>>>     [28] "hugene10sttranscriptclusterPROSITE has 18146 mapped keys
>>>     (of 33297 keys)"
>>>     [29] "hugene10sttranscriptclusterREFSEQ has 19873 mapped keys
>>>     (of 33297 keys)"
>>>     [30] "hugene10sttranscriptclusterSYMBOL has 19962 mapped keys
>>>     (of 33297 keys)"
>>>     [31] "hugene10sttranscriptclusterUNIGENE has 19578 mapped keys
>>>     (of 33297 keys)"
>>>     [32] "hugene10sttranscriptclusterUNIPROT has 18193 mapped keys
>>>     (of 33297 keys)"
>>>     [33] ""
>>>     [34] ""
>>>     [35] "Additional Information about this package:"
>>>     [36] ""
>>>     [37] "DB schema: HUMANCHIP_DB"
>>>     [38] "DB schema version: 2.1"
>>>     [39] "Organism: Homo sapiens"
>>>     [40] "Date for NCBI data: 2014-Mar13"
>>>     [41] "Date for GO data: 20140308"
>>>     [42] "Date for KEGG data: 2011-Mar15"
>>>     [43] "Date for Golden Path data: 2010-Mar22"
>>>     [44] "Date for Ensembl data: 2014-Feb26"
>>>
>>>     It seems like something is broken there showing in line 4:
>>>      [6] "hugene10sttranscriptclusterACCNUM has 0 mapped keys (of
>>>     33297 keys)"
>>>
>>>     Any ideas on how to solve this? Or whether this is a bug on my
>>>     side or on the package side?
>>>
>>>     Kind Regards
>>>
>>>     Thomas
>>>
>>>
>>>     -- 
>>>     Université du Luxembourg
>>>     Faculté des Sciences, de la Technologie et de la Communication
>>>     Campus Limpertsberg, BRB 2.13
>>>     162a, avenue de la Faïencerie
>>>     L-1511 Luxembourg
>>>     Email: thomas.pfau at uni.lu <mailto:thomas.pfau at uni.lu>
>>>
>>>     _______________________________________________
>>>     Bioconductor mailing list
>>>     Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>>>     https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>     Search the archives:
>>>     http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>>
>>>
>>> -- 
>>> James W. MacDonald, M.S.
>>> Biostatistician
>>> University of Washington
>>> Environmental and Occupational Health Sciences
>>> 4225 Roosevelt Way NE, # 100
>>> Seattle WA 98105-6099
>>
>
> -- 
> Université du Luxembourg
> Faculté des Sciences, de la Technologie et de la Communication
> Campus Limpertsberg, BRB 2.13
> 162a, avenue de la Faïencerie
> L-1511 Luxembourg
> Email:thomas.pfau at uni.lu  


	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list