[BioC] Genbank to Unigene IDs

Thu Apr 15 19:15:29 CEST 2004

That is closer, thanks. 
library(AnnBuilder)
myBaseType <- "gb"
myDir <- "C:/Temp"
myBase <- "C:/Temp/tempFile.txt"
mySrcUrl <- getSrcUrl("UG")
ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType =
myBaseType, pkgName = "Hum_Agi1A", pkgPath = myDir,organism = "human",
version = "1.0", makeXML = TRUE, author = list(author = "dpritch",
maintainer ="dpritch at u.washington.edu"), fromWeb = TRUE) 

I see that this is now writing to 
C:/R/rw1090beta/library/AnnBuilder/data/
Instead of temp and I also had to unzip
C:/R/rw1090beta/library/AnnBuilder/data/Rdata.zip
as it couldn't find Anninfo:
"In addition: Warning message: 
cannot open file `C:/R/rw1090beta/library/AnnBuilder/data/AnnInfo'"

Also the example still fails with
Error in file(file, "r") : unable to open connection
In addition: Warning message: 
cannot open file `C:/R/rw1090beta/library/AnnBuilder/temp/tempOut27202' 
Error in unifyMappings(base, ll, ug, otherSrc, fromWeb) : 
        Failed to get or parse LocusLink data because of:

 Error in file(file, "r") : unable to open connection

I also get a few PERL error messages:

Scalar value @vals[1] better written as $vals[1] at
C:\R\rw1090beta\library\AnnBuilder\temp\tempPerl28396.pl line 16.
Use of uninitialized value in split at
C:\R\rw1090beta\library\AnnBuilder\temp\t
empPerl28396.pl line 16, <BASE> line 1.

Use of uninitialized value in split at
C:\R\rw1090beta\library\AnnBuilder\temp\t
empPerl23771.pl line 16, <BASE> line 1.

Useless use of a variable in void context at
C:\R\rw1090beta\library\AnnBuilder\
temp\tempPerl7801.pl line 38.

It also downloaded LL_tmpl.gz twice (refGene.txt,gz, refLink.txt,gz,
Hs.data.gz, and go_200403-termdb.xml.gz once) and finally failed after 45
minutes with:
> ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType =
myBaseType, pkgName = "Hum_Agi1A", pkgPath = myDir,organism = "human",
version = "1.0", makeXML = TRUE, author = list(author = "dpritch",
maintainer ="dpritch at u.washington.edu"), fromWeb = TRUE)
[1] "It may take me a while to process the data. Be patient!"
Error in url(paste(srcUrl, exten, sep = ""), "r") : 
        unable to open connection

Examining the failed command gives:
> url(paste(srcUrl, exten, sep = ""), "r")
Error in paste(srcUrl, exten, sep = "") : Object "exten" not found

Has anyone got this running in Windows?
Dave.

-----Original Message-----
From: James MacDonald [mailto:jmacdon at med.umich.edu] 
Sent: Thursday, April 15, 2004 9:52 AM
To: dwaddell at nutecsciences.com; bioconductor at stat.math.ethz.ch
Subject: RE: [BioC] Genbank to Unigene IDs

You probably need to update your AnnBuilder. A recent version was using
the system temp directory instead of the AnnBuilder temp directory,
which didn't work well on Win32. AFAIK, the current devel version of
AnnBuilder has been rolled back to use the AnnBuilder temp dir.

As an aside, if all you need is GB -> UG mappings, it is probably
overkill to use ABPkgBuilder in this way, which is going to parse locus
link and KEGG also (which takes some time). There are two alternatives
that I can think of, (both untested by me). First, use ABPkgBuilder, but
only parse UG by changing the srcUrl to:

mySrcUrl <- getSrcUrl("UG")

Another possiblity is to use the UG class directly. See ?UG. 

Best,

Jim

James W. MacDonald
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623

>>> "Dave Waddell" <dwaddell at nutecsciences.com> 04/15/04 10:37AM >>>
I tried running this but got an error:
> library(AnnBuilder)
> myBaseType <- "gb"
> myDir <- "C:/Temp"
> myBase <- "C:/Temp/tempFile.txt"
> mySrcUrls <- getSrcUrl(src = "ALL",organism  = "human")
> ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType =
+      myBaseType, pkgName = "Hum_Agi1A", pkgPath = myDir,organism =
+      "human",  version = "1.0",
+      makeXML = TRUE, author = list(author = "dpritch", maintainer =
+      "dpritch at u.washington.edu"), fromWeb = TRUE)
[1] "It may take me a while to process the data. Be patient!"
Warning message: 
cannot open file `C:/R/rw1090beta/library/AnnBuilder/temp/tempOut31783'

Error in unifyMappings(base, ll, ug, otherSrc, fromWeb) : 
        Failed to get or parse LocusLink data because of:

 Error in file(file, "r") : unable to open connection

I had changed this directory from "Read Only" and checked that I had
write
permissions from within R:
> setwd("C:/R/rw1090beta/library/AnnBuilder/temp")
> dir()
[1] "file24842Tgo.xml" "README"          
> write("Hello")
> dir()
[1] "data"             "file24842Tgo.xml" "README"

I get the same error if I run 
example("ABPkgBuilder")

Any suggestions?

Dave.
-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch 
[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of A.J.
Rossini
Sent: Thursday, April 15, 2004 8:48 AM
To: Gordon Smyth
Cc: BioC Mailing List
Subject: Re: [BioC] Genbank to Unigene IDs

Gordon Smyth <smyth at wehi.edu.au> writes:

> I have a list of GenBank IDs for which I'd like the corresponding
> Unigene cluster IDs. What is the easiest way to do this using
> Bioconductor functions? (I've scanned annotate and AnnBuilder help
and
> vignettes, although way too quickly.)
>
> For the sake of being specific, here's a concrete example. What's
> Unigene for GB="NM_004551"?

Here's what I'd do (more of a chip-style analysis than instant
WWW-based gratification, which might also be possible):

1. First create a tab-separated 2 column file, first row dummy
probe IDs (could be real or not), second row GB ID's.  So, you'd have
1 row in a file called "Dummy.tsv"

1    NM_004551

2.  Have a script similar to:

library(AnnBuilder)
myBaseType <- "gb"
# myDir maps the directory where you want the data package built ---
# obviously this should be changed for the directory structure on the
# linux box
myDir <- "C:/DavidsData/Annotation_Folders"

# myBase maps the file that contains the mapping of Agilent feature
# numbers to GenBank ID's
myBase <- "C:/DavidsData/Annotation_Folders/Dummy.tsv"

#use AnnBuilder internal lists of data sources
mySrcUrls <- getSrcUrl(src = "ALL",organism  = "human")

#invoke ABPkgBuilder
ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType =
                      myBaseType, pkgName = "Hum_Agi1A", pkgPath =
myDir,
organism =
                      "human",  version = "1.0",
                      makeXML = TRUE, author = list(author =
"dpritch",
maintainer =
                     "dpritch at u.washington.edu"), fromWeb = TRUE)

3. install the package environment

4. use it to find the IDs (can verify the ID mapping with the XML
output file, as well)

best,
-tony

-- 
rossini at u.washington.edu           
http://www.analytics.washington.edu/ 
Biomedical and Health Informatics   University of Washington
Biostatistics, SCHARP/HVTN          Fred Hutchinson Cancer Research
Center
UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable
FHCRC  (M/W): 206-667-7025 FAX=206-667-4812 | use Email

CONFIDENTIALITY NOTICE: This e-mail message and any\ attachm...{{dropped}}