[BioC] question about TranscriptDb

Ryan C. Thompson rct at thompsonclan.org
Mon Dec 10 21:25:46 CET 2012


I have also been bitten by the fact that some transcripts are missing 
gene IDs. Is it possible to add placeholder gene IDs to these? For 
example, just assigning them UNKNOWN1, UNKNOWN2, etc.?

On Mon 10 Dec 2012 11:40:35 AM PST, Marc Carlson wrote:
> Hi Matthew,
>
> Thanks for your detailed exploration of this. After looking more
> closely, I think the confusion here is being caused by the fact that you
> are looking at the kgXref table, and what was actually used to attach
> gene Ids to the TxDb database is actually the knownToLocusLink
> <http://genome.ucsc.edu/cgi-bin/hgTables?hgsid=316115443&hgta_doSchemaDb=hg19&hgta_doSchemaTable=knownToLocusLink>
> table.  Adding to the mayhem, UCSC has apparently decided to allow
> different values to exist into the latest versions of these two tables.
>
> We chose to use the Entrez Gene IDs as gene identifiers because (unlike
> gene symbols) they represent a real identifier and can thus be relied on
> to not have multiple different meanings etc.
>
>
>    Marc
>
>
>
> On 12/10/2012 09:06 AM, Matthew D. Wilkerson wrote:
>> Hello,
>>
>> I have a question about the gene_id attribute of
>> TxDb.Hsapiens.UCSC.hg19.knownGene, version 2.80 (latest).
>>
>> I noticed that some transcripts such as uc021ums.1, do not have an
>> associated gene_id.
>>
>> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>> t=transcripts(txdb,columns=c("gene_id","tx_id","tx_name","cds_id","cds_name"))
>>
>> t[ which(elementMetadata(t)[,"tx_name"]=="uc021ums.1"), ]
>>
>> I understand that some ucsc genes might not have an entrez gene id
>> associated.
>> I checked this locus and found that currently UCSC db does have this
>> locus associated with LINGO3.
>>
>> #hg19.knownGene.name    hg19.knownGene.chrom
>> hg19.knownGene.strand    hg19.knownGene.txStart
>> hg19.knownGene.txEnd    hg19.knownGene.cdsStart
>> hg19.knownGene.cdsEnd    hg19.knownGene.exonCount
>> hg19.knownGene.exonStarts    hg19.knownGene.exonEnds
>> hg19.knownGene.proteinID    hg19.knownGene.alignID
>> hg19.kgXref.kgID    hg19.kgXref.geneSymbol
>> uc021ums.1    chr19    -    2289996    2291775    2289996
>> 2291775    1    2289996,    2291775,    P0C6S8    uc021ums.1
>> uc021ums.1    LINGO3
>>
>>
>> The kgXref table was last updated  2/5/12.
>>
>>
>> The bioconductor package was made on:
>> Creation time: 2012-09-10 12:56:25 -0700 (Mon, 10 Sep 2012)
>>
>> If this date also refers to the date of download, then why is this
>> transcript not affiliated with LINGO3?
>> If not, then what date does known gene refer to?
>>
>>
>> Thanks,
>> Matt
>>
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list