[BioC] Annotation data package is not being created

John Zhang jzhang at jimmy.harvard.edu
Mon Dec 15 19:34:31 MET 2003


>I'm not sure what you mean by saying the program stopped because they are 
>not mapped to "enzyme annotation data"? I don't think you mean that your
>input has to code for enzymes? :)  

No mapping for enzyme was obtained from the source data provided by KEGG.

>Could you please clarify the path that
>ABPkgBuilder takes
>to get the information from such databases such as LocusLink, UniGene,
>Golden Path, Gene Ontology, and 
>KEGG.  My sample input file is representative of the rest of my data.  
>I took my clone IDs and went to the I.M.A.G.E. site and got the
>corresponding
>genbank accession numbers.  With these numbers I want to ge the unigene
>information.

Map probe ids to LocusLink ids and then use LocusLink ids as the point of 
linkage to get annotation data from the source data. 

Data from UniGene is used for mapping GenBank accesion numbers to LocusLink ids 
only. If you only want data from UniGene for your clones, you may simple write a 
parser to parse the data from UniGene other than building a package using 
AnnBuilder that has a lot of things you do not enve need.

>
>I looked at your example "How to use AnnBuilder" and looked at the Genbank
>accession numbers and
>compared them to mine.  When I type in for example D90278 into Entrez you
>find that it has a locus and one
>of the link outs is Gene.  However, when I type in one of my Genbank
>accession numbers T65425 into Entrez
>a locus is not available and the only useful link is to unigene.  This will
>be the case for my data. 

AnnBuilder maps GenBank accession number by unifying the mappings provided by 
LocusLink, UniGene, and "other sources" specified by a user. If a map is 
provided by either LocusLink or UniGene, that map is assinged to a probe id. 
Otherwise, the map provided by the majority of "other sources" will be assigned 
to a probe id.

>Is it possible that ABPkgBuilder requires that there is a locus for your
>Genbank accession number in order
>to access information from the other databases (LocusLink, UniGene, Golden
>Path, Gene Ontology, and KEGG)?

AnnBuilder does the mapping using LocusLink id as the point of linkage between 
data souces. If a GenBank accession number can be mapped to a LocusLink id, 
annotation data may be obtained. Otherwise, no annotation data will be obtained.

>
>If ABPkgBuilder does not allow me to get a unigene cluster from a Genbank
>Accession number for my clones
>are there other functions within Bioconductor that will enable me to achieve
>my goal?

When the annotation is done, you do have UniGene ids as one of the annotation 
element. If you want to use UniGene ids as the base mapping type, you will have 
to get the correct UniGene ids yourself.



>
>thanks very much for your help,
>Annie.
>
>-----Original Message-----
>From: John Zhang [mailto:jzhang at jimmy.harvard.edu]
>Sent: Friday, December 12, 2003 1:56 PM
>To: Annie.Law at nrc-cnrc.gc.ca
>Subject: RE: [BioC] Annotation data package is not being created
>
>
>I tried your base file. It stoped because the colones were not mapped to any
>
>enzyme annotation data. I will add some error traping functions in but
>meanwhile 
>you may try some real data to see how it works for you. The most time
>consumming 
>parts are source data processing. The size of your base file does not reduce
>the 
>time of execution much.
>
>>From: "Law, Annie" <Annie.Law at nrc-cnrc.gc.ca>
>>To: "'John Zhang'" <jzhang at jimmy.harvard.edu>
>>Subject: RE: [BioC] Annotation data package is not being created
>>Date: Thu, 11 Dec 2003 16:00:46 -0500
>>MIME-Version: 1.0
>>X-Keywords: 
>>
>>Hi John,
>>
>>Here is my file.
>>
>>thank you,
>>Annie.
>>
>>
>>-----Original Message-----
>>From: John Zhang [mailto:jzhang at jimmy.harvard.edu]
>>Sent: Thursday, December 11, 2003 3:41 PM
>>To: Annie.Law at nrc-cnrc.gc.ca
>>Cc: bioconductor at stat.math.ethz.ch
>>Subject: Re: [BioC] Annotation data package is not being created
>>
>>
>>Could you send me a copy of your base file so that I can try it to figure
>>out 
>>what might be wrong? Thanks.
>>>
>>>I would appreciate help with the following.  I was following the vignette
>>>"How to use AnnBuilder".
>>>I tried to adapt this to my goal of creating an annotation data package
>>with
>>>the Unigene Identifiers.  
>>>I made some very minor changes and used the file samclonegb2 which is just
>>a
>>>small text file with 
>>>the first column being a list of IMAGE cloneIDs and the second column is a
>>>list of GenBank accesion numbers.
>>>I used the following lines and the sampclonegb2 seem to load properly and
>>>then finally 
>>>I got the error message listed below. I get some output files formed for
>>>example the XML file is formed
>>>but my input data has not been mapped to any of the information from the
>>>databases.
>>>"Error in "colnames<-"(`*tmp*`, value = colNames) :
>>>        attempt to set colnames on object with less than two dimensions"
>>>I am not sure what I am missing.
>>>
>>>Also, my current file sampclonegb2 is very simple in that I have one
>>Genbank
>>>accession number
>>>for each cloneID.  My actual source file contains cases where I have more
>>>than one
>>>Genbank accession number associated with a cloneid.  What is the best way
>>to
>>>approach this?
>>>
>>>thanks very much,
>>>Annie.
>>>
>>>
>>>library(AnnBuilder)
>>>read.table(file.path(.path.package("AnnBuilder"), "data", "sampclonegb2"),
>>>sep = "\t", header = FALSE, as.is = TRUE)
>>>myBase <- file.path(.path.package("AnnBuilder"), "data", "sampclonegb2")
>>>myBaseType <- "gb"
>>>mySrcUrls <- getSrcUrl("all", organism = "human")
>>>mySrcUrls
>>>myDir <- tempdir()
>>>if (.Platform$OS.type == "unix") {
>>>fromWeb <- TRUE
>>>} else {
>>>fromWeb <- FALSE
>>>}
>>>if (.Platform$OS.type != "windows") {
>>>ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType =
>>>myBaseType,
>>>otherSrc = NULL, pkgName = "abmyPkg", pkgPath = myDir,
>>>organism = "human", version = "1.1.0", makeXML = TRUE,
>>>author = list(author = "Annie", maintainer = "myname at myemail.com"),
>>>fromWeb =fromWeb)}
>>>
>>>"It may take me a while to process the data. Be patient!"
>>>Error in "colnames<-"(`*tmp*`, value = colNames) :
>>>        attempt to set colnames on object with less than two dimensions
>>>
>>>_______________________________________________
>>>Bioconductor mailing list
>>>Bioconductor at stat.math.ethz.ch
>>>https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>>



More information about the Bioconductor mailing list