[BioC] annotate and GenBank Accession ID

Marc Carlson mcarlson at fhcrc.org
Fri Jan 14 21:21:51 CET 2011


Hi Thomas,

You can get GeneBank IDs, but they are jumbled together with other sorts
of accessions inside of the ACCNUM mappings.

If you just want all possible accessions, you can do it like this:
prbs = c("8180408","8180388")
egs = unlist2(mget(prbs, hugene10sttranscriptclusterENTREZID,
ifnotfound=NA))
mget(egs, org.Hs.egACCNUM, ifnotfound=NA)

## OR a more tabular way:
prbs = c("8180408","8180388")
merge(toTable(hugene10sttranscriptclusterENTREZID[prbs]),
toTable(org.Hs.egACCNUM))

Regardless of which method you use, the result is the same and will net
you all the accessions for these genes.  At this point, it is pretty
easy to filter out the refseq IDs (for example) by grepping for "_". 

But please notice however that you want to use the ACCNUM mapping from
org.Hs.eg to get this.  This mapping is NOT the same as the ACCNUM
mapping in your chip package.  That is because the one in your chip
package only holds accessions that were used to actually make the
association between the transcript IDs and the Entrez Gene IDs in that
package.  This makes it conceptually a bit different from all the other
mappings.  In contrast, the one in org.Hs.eg holds all of the accessions
that are affiliated with a given gene.

Hope that this helps you,


  Marc



On 01/13/2011 01:25 PM, Thomas Hampton wrote:
>
> library(hugene10sttranscriptcluster.db)
>
> > ls("package:hugene10sttranscriptcluster.db")
> [1] "hugene10sttranscriptcluster"
> [2] "hugene10sttranscriptclusterACCNUM"
> [3] "hugene10sttranscriptclusterALIAS2PROBE"
> [4] "hugene10sttranscriptclusterCHR"
> [5] "hugene10sttranscriptclusterCHRLENGTHS"
> [6] "hugene10sttranscriptclusterCHRLOC"
> [7] "hugene10sttranscriptclusterCHRLOCEND"
> [8] "hugene10sttranscriptcluster_dbconn"
> [9] "hugene10sttranscriptcluster_dbfile"
> [10] "hugene10sttranscriptcluster_dbInfo"
> [11] "hugene10sttranscriptcluster_dbschema"
> [12] "hugene10sttranscriptclusterENSEMBL"
> [13] "hugene10sttranscriptclusterENSEMBL2PROBE"
> [14] "hugene10sttranscriptclusterENTREZID"
> [15] "hugene10sttranscriptclusterENZYME"
> [16] "hugene10sttranscriptclusterENZYME2PROBE"
> [17] "hugene10sttranscriptclusterGENENAME"
> [18] "hugene10sttranscriptclusterGO"
> [19] "hugene10sttranscriptclusterGO2ALLPROBES"
> [20] "hugene10sttranscriptclusterGO2PROBE"
> [21] "hugene10sttranscriptclusterMAP"
> [22] "hugene10sttranscriptclusterMAPCOUNTS"
> [23] "hugene10sttranscriptclusterOMIM"
> [24] "hugene10sttranscriptclusterORGANISM"
> [25] "hugene10sttranscriptclusterORGPKG"
> [26] "hugene10sttranscriptclusterPATH"
> [27] "hugene10sttranscriptclusterPATH2PROBE"
> [28] "hugene10sttranscriptclusterPFAM"
> [29] "hugene10sttranscriptclusterPMID"
> [30] "hugene10sttranscriptclusterPMID2PROBE"
> [31] "hugene10sttranscriptclusterPROSITE"
> [32] "hugene10sttranscriptclusterREFSEQ"
> [33] "hugene10sttranscriptclusterSYMBOL"
> [34] "hugene10sttranscriptclusterUNIGENE"
> [35] "hugene10sttranscriptclusterUNIPROT"
>
> Above, I note that GeneBank ids are not directly supported for my
> chip. Darn.
>
> I am therefore using an online converter to map a list of GenBank
> accessions to ENTREZ gene ids because it looks tricky to use
> the entrezGeneByID function in annotate for a list of IDS.
>
> Ami I missing something?
>
> Best,
>
> Tom
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list