[BioC] Genbank to Unigene IDs

A.J. Rossini rossini at blindglobe.net
Thu Apr 15 15:48:10 CEST 2004

Gordon Smyth <smyth at wehi.edu.au> writes:

> I have a list of GenBank IDs for which I'd like the corresponding
> Unigene cluster IDs. What is the easiest way to do this using
> Bioconductor functions? (I've scanned annotate and AnnBuilder help and
> vignettes, although way too quickly.)
> For the sake of being specific, here's a concrete example. What's
> Unigene for GB="NM_004551"?

Here's what I'd do (more of a chip-style analysis than instant
WWW-based gratification, which might also be possible):

1. First create a tab-separated 2 column file, first row dummy
probe IDs (could be real or not), second row GB ID's.  So, you'd have
1 row in a file called "Dummy.tsv"

1    NM_004551

2.  Have a script similar to:

myBaseType <- "gb"
# myDir maps the directory where you want the data package built ---
# obviously this should be changed for the directory structure on the
# linux box
myDir <- "C:/DavidsData/Annotation_Folders"

# myBase maps the file that contains the mapping of Agilent feature
# numbers to GenBank ID's
myBase <- "C:/DavidsData/Annotation_Folders/Dummy.tsv"

#use AnnBuilder internal lists of data sources
mySrcUrls <- getSrcUrl(src = "ALL",organism  = "human")

#invoke ABPkgBuilder
ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType =
                      myBaseType, pkgName = "Hum_Agi1A", pkgPath = myDir, organism =
                      "human",  version = "1.0",
                      makeXML = TRUE, author = list(author = "dpritch", maintainer =
                     "dpritch at u.washington.edu"), fromWeb = TRUE)

3. install the package environment

4. use it to find the IDs (can verify the ID mapping with the XML
output file, as well)


rossini at u.washington.edu            http://www.analytics.washington.edu/ 
Biomedical and Health Informatics   University of Washington
Biostatistics, SCHARP/HVTN          Fred Hutchinson Cancer Research Center
UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable
FHCRC  (M/W): 206-667-7025 FAX=206-667-4812 | use Email

CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}}

More information about the Bioconductor mailing list