[BioC] Genbank to Unigene IDs

A.J. Rossini rossini at blindglobe.net
Thu Apr 15 23:31:54 CEST 2004


Dave -

Sorry to have led you on a wild goose chase.  We've been much more
successful on Linux builds; one solution was to have pre-downloaded
files, but I can't seem to quickly find our mini-script that did that
(it removed the D/L hassle problem, espec if you are working with
genes which probably aren't changing).


"Dave Waddell" <dwaddell at nutecsciences.com> writes:

> The output from:
> mySrcUrl <- getSrcUrl("UG")
> is
>> mySrcUrl
> [1] "ftp://ftp.ncbi.nih.gov/repository/UniGene/Hs.data.gz"
> this is rejected by ABPkgBuilder:
> "Error in toupper(x) : non-character argument to toupper()"
>
> when getSrcUrl has the ALL argument it gives:
> mySrcUrls <- getSrcUrl(src = "ALL",organism  = "human")
>> mySrcUrls
>  
> LL 
>  
> "ftp://ftp.ncbi.nih.gov/refseq/LocusLink/LL_tmpl.gz" 
>  
> GP 
>  
> "http://www.genome.ucsc.edu/goldenPath/hg16/database/" 
>  
> UG 
>  
> "ftp://ftp.ncbi.nih.gov/repository/UniGene/Hs.data.gz" 
>  
> GO 
> "http://www.godatabase.org/dev/database/archive/2004-03-01/go_200403-termdb.
> xml.gz" 
>  
> KEGG 
>  
> "ftp://ftp.genome.ad.jp/pub/kegg/pathways" 
>  
> YG 
>  
> "http://www.yeastgenome.org/DownloadContents.shtml" 
>  
> HG 
>  
> "ftp://ftp.ncbi.nih.gov/pub/HomoloGene/hmlg.ftp"
> So I thought I might cheat and use:
> mySrcUrl <- mySrcUrls[3]
>> mySrcUrls[3]
>                                                     UG 
> "ftp://ftp.ncbi.nih.gov/repository/UniGene/Hs.data.gz"
>
> As you can see this gets rejected as well:
> Error in loadFromUrl(srcUrl(object), dist) : 
>         URL NA is incorrect or the target site is not responding!
> Error in unifyMappings(base, ll, ug, otherSrc, fromWeb) : 
>         Failed to get or parse UniGene data becaus of:
>
>  Error in loadFromUrl(srcUrl(object), dist) : 
>         URL NA is incorrect or the target site is not responding!
> Is it possible to use Annotation that was created on Linux in the Windows
> environment? If so, does anyone want to donate it?
> Thanks, Dave.
>
>
> -----Original Message-----
> From: James MacDonald [mailto:jmacdon at med.umich.edu] 
> Sent: Thursday, April 15, 2004 9:52 AM
> To: dwaddell at nutecsciences.com; bioconductor at stat.math.ethz.ch
> Subject: RE: [BioC] Genbank to Unigene IDs
>
> You probably need to update your AnnBuilder. A recent version was using
> the system temp directory instead of the AnnBuilder temp directory,
> which didn't work well on Win32. AFAIK, the current devel version of
> AnnBuilder has been rolled back to use the AnnBuilder temp dir.
>
> As an aside, if all you need is GB -> UG mappings, it is probably
> overkill to use ABPkgBuilder in this way, which is going to parse locus
> link and KEGG also (which takes some time). There are two alternatives
> that I can think of, (both untested by me). First, use ABPkgBuilder, but
> only parse UG by changing the srcUrl to:
>
> mySrcUrl <- getSrcUrl("UG")
>
> Another possiblity is to use the UG class directly. See ?UG. 
>
> Best,
>
> Jim
>
>
>
> James W. MacDonald
> Affymetrix and cDNA Microarray Core
> University of Michigan Cancer Center
> 1500 E. Medical Center Drive
> 7410 CCGC
> Ann Arbor MI 48109
> 734-647-5623
>
>>>> "Dave Waddell" <dwaddell at nutecsciences.com> 04/15/04 10:37AM >>>
> I tried running this but got an error:
>> library(AnnBuilder)
>> myBaseType <- "gb"
>> myDir <- "C:/Temp"
>> myBase <- "C:/Temp/tempFile.txt"
>> mySrcUrls <- getSrcUrl(src = "ALL",organism  = "human")
>> ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType =
> +      myBaseType, pkgName = "Hum_Agi1A", pkgPath = myDir,organism =
> +      "human",  version = "1.0",
> +      makeXML = TRUE, author = list(author = "dpritch", maintainer =
> +      "dpritch at u.washington.edu"), fromWeb = TRUE)
> [1] "It may take me a while to process the data. Be patient!"
> Warning message: 
> cannot open file `C:/R/rw1090beta/library/AnnBuilder/temp/tempOut31783'
>
> Error in unifyMappings(base, ll, ug, otherSrc, fromWeb) : 
>         Failed to get or parse LocusLink data because of:
>
>  Error in file(file, "r") : unable to open connection
>
> I had changed this directory from "Read Only" and checked that I had
> write
> permissions from within R:
>> setwd("C:/R/rw1090beta/library/AnnBuilder/temp")
>> dir()
> [1] "file24842Tgo.xml" "README"          
>> write("Hello")
>> dir()
> [1] "data"             "file24842Tgo.xml" "README"
>
> I get the same error if I run 
> example("ABPkgBuilder")
>
> Any suggestions?
>
> Dave.
> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch 
> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of A.J.
> Rossini
> Sent: Thursday, April 15, 2004 8:48 AM
> To: Gordon Smyth
> Cc: BioC Mailing List
> Subject: Re: [BioC] Genbank to Unigene IDs
>
> Gordon Smyth <smyth at wehi.edu.au> writes:
>
>> I have a list of GenBank IDs for which I'd like the corresponding
>> Unigene cluster IDs. What is the easiest way to do this using
>> Bioconductor functions? (I've scanned annotate and AnnBuilder help
> and
>> vignettes, although way too quickly.)
>>
>> For the sake of being specific, here's a concrete example. What's
>> Unigene for GB="NM_004551"?
>
> Here's what I'd do (more of a chip-style analysis than instant
> WWW-based gratification, which might also be possible):
>
> 1. First create a tab-separated 2 column file, first row dummy
> probe IDs (could be real or not), second row GB ID's.  So, you'd have
> 1 row in a file called "Dummy.tsv"
>
>
>
> 1    NM_004551
>
>
>
>
> 2.  Have a script similar to:
>
>
>
> library(AnnBuilder)
> myBaseType <- "gb"
> # myDir maps the directory where you want the data package built ---
> # obviously this should be changed for the directory structure on the
> # linux box
> myDir <- "C:/DavidsData/Annotation_Folders"
>
> # myBase maps the file that contains the mapping of Agilent feature
> # numbers to GenBank ID's
> myBase <- "C:/DavidsData/Annotation_Folders/Dummy.tsv"
>
> #use AnnBuilder internal lists of data sources
> mySrcUrls <- getSrcUrl(src = "ALL",organism  = "human")
>
> #invoke ABPkgBuilder
> ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType =
>                       myBaseType, pkgName = "Hum_Agi1A", pkgPath =
> myDir,
> organism =
>                       "human",  version = "1.0",
>                       makeXML = TRUE, author = list(author =
> "dpritch",
> maintainer =
>                      "dpritch at u.washington.edu"), fromWeb = TRUE)
>
> 3. install the package environment
>
> 4. use it to find the IDs (can verify the ID mapping with the XML
> output file, as well)
>
> best,
> -tony
>
> -- 
> rossini at u.washington.edu           
> http://www.analytics.washington.edu/ 
> Biomedical and Health Informatics   University of Washington
> Biostatistics, SCHARP/HVTN          Fred Hutchinson Cancer Research
> Center
> UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable
> FHCRC  (M/W): 206-667-7025 FAX=206-667-4812 | use Email
>
> CONFIDENTIALITY NOTICE: This e-mail message and any\ attachm...{{dropped}}
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>

-- 
rossini at u.washington.edu            http://www.analytics.washington.edu/ 
Biomedical and Health Informatics   University of Washington
Biostatistics, SCHARP/HVTN          Fred Hutchinson Cancer Research Center
UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable
FHCRC  (M/W): 206-667-7025 FAX=206-667-4812 | use Email

CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}}



More information about the Bioconductor mailing list