[BioC] Genbank to Unigene IDs

James MacDonald jmacdon at med.umich.edu
Thu Apr 15 16:52:26 CEST 2004

You probably need to update your AnnBuilder. A recent version was using
the system temp directory instead of the AnnBuilder temp directory,
which didn't work well on Win32. AFAIK, the current devel version of
AnnBuilder has been rolled back to use the AnnBuilder temp dir.

As an aside, if all you need is GB -> UG mappings, it is probably
overkill to use ABPkgBuilder in this way, which is going to parse locus
link and KEGG also (which takes some time). There are two alternatives
that I can think of, (both untested by me). First, use ABPkgBuilder, but
only parse UG by changing the srcUrl to:

mySrcUrl <- getSrcUrl("UG")

Another possiblity is to use the UG class directly. See ?UG. 



James W. MacDonald
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109

>>> "Dave Waddell" <dwaddell at nutecsciences.com> 04/15/04 10:37AM >>>
I tried running this but got an error:
> library(AnnBuilder)
> myBaseType <- "gb"
> myDir <- "C:/Temp"
> myBase <- "C:/Temp/tempFile.txt"
> mySrcUrls <- getSrcUrl(src = "ALL",organism  = "human")
> ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType =
+      myBaseType, pkgName = "Hum_Agi1A", pkgPath = myDir,organism =
+      "human",  version = "1.0",
+      makeXML = TRUE, author = list(author = "dpritch", maintainer =
+      "dpritch at u.washington.edu"), fromWeb = TRUE)
[1] "It may take me a while to process the data. Be patient!"
Warning message: 
cannot open file `C:/R/rw1090beta/library/AnnBuilder/temp/tempOut31783'

Error in unifyMappings(base, ll, ug, otherSrc, fromWeb) : 
        Failed to get or parse LocusLink data because of:

 Error in file(file, "r") : unable to open connection

I had changed this directory from "Read Only" and checked that I had
permissions from within R:
> setwd("C:/R/rw1090beta/library/AnnBuilder/temp")
> dir()
[1] "file24842Tgo.xml" "README"          
> write("Hello")
> dir()
[1] "data"             "file24842Tgo.xml" "README"

I get the same error if I run 

Any suggestions?

-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch 
[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of A.J.
Sent: Thursday, April 15, 2004 8:48 AM
To: Gordon Smyth
Cc: BioC Mailing List
Subject: Re: [BioC] Genbank to Unigene IDs

Gordon Smyth <smyth at wehi.edu.au> writes:

> I have a list of GenBank IDs for which I'd like the corresponding
> Unigene cluster IDs. What is the easiest way to do this using
> Bioconductor functions? (I've scanned annotate and AnnBuilder help
> vignettes, although way too quickly.)
> For the sake of being specific, here's a concrete example. What's
> Unigene for GB="NM_004551"?

Here's what I'd do (more of a chip-style analysis than instant
WWW-based gratification, which might also be possible):

1. First create a tab-separated 2 column file, first row dummy
probe IDs (could be real or not), second row GB ID's.  So, you'd have
1 row in a file called "Dummy.tsv"

1    NM_004551

2.  Have a script similar to:

myBaseType <- "gb"
# myDir maps the directory where you want the data package built ---
# obviously this should be changed for the directory structure on the
# linux box
myDir <- "C:/DavidsData/Annotation_Folders"

# myBase maps the file that contains the mapping of Agilent feature
# numbers to GenBank ID's
myBase <- "C:/DavidsData/Annotation_Folders/Dummy.tsv"

#use AnnBuilder internal lists of data sources
mySrcUrls <- getSrcUrl(src = "ALL",organism  = "human")

#invoke ABPkgBuilder
ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType =
                      myBaseType, pkgName = "Hum_Agi1A", pkgPath =
organism =
                      "human",  version = "1.0",
                      makeXML = TRUE, author = list(author =
maintainer =
                     "dpritch at u.washington.edu"), fromWeb = TRUE)

3. install the package environment

4. use it to find the IDs (can verify the ID mapping with the XML
output file, as well)


rossini at u.washington.edu           
Biomedical and Health Informatics   University of Washington
Biostatistics, SCHARP/HVTN          Fred Hutchinson Cancer Research
UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable
FHCRC  (M/W): 206-667-7025 FAX=206-667-4812 | use Email

CONFIDENTIALITY NOTICE: This e-mail message and any\ attachm...{{dropped}}

More information about the Bioconductor mailing list