[BioC] AnnBuilders paseData() doesn't recognize ACCs with underscore?

Benjamin Otto b.otto at uke.uni-hamburg.de
Thu Jan 18 16:18:19 CET 2007


Hi John,

Here comes a correction to my last email. Probably my brain is working in
power save mode today but now I'm a little bit confused:

1) gbUGParser should get genbank ids (accessions) and return unigene ids,
right?
2) NM_xxxxxx might denote reference sequences but still ARE accessions,
right? AND they are genbank identifiers.

So gbUGParser SHOULD recognize them as valid identifier. 

Regards,

benjamin


-----Ursprüngliche Nachricht-----
Von: John Zhang [mailto:jzhang at jimmy.harvard.edu] 
Gesendet: 17 January 2007 15:12
An: bioconductor at stat.math.ethz.ch; b.otto at uke.uni-hamburg.de
Betreff: Re: [BioC] AnnBuilders paseData() doesn't recognize ACCs with
underscore?


>
>parseData() seems to have problems in recognition of accession numbers 
>including an underscore like "NM_001815". The function just doesn't 
>find them although they do exist in the database file.

You have used a wrong parser. There are parsers, such as egRefseqParser and
gbNRef2LLParser, that handles RefSeq ids with undersores. You need to pick
one that fits your data. 

>
>Here is the example I'm trying to get working:
>
>>library(AnnBuilder)
>>pkgpath <- .find.package("AnnBuilder")
>># unigene infos
>>ugUrl <- "C:/Programme/R/R-2.4.1/library/AnnBuilder/data/Ths.data"
>># parsing
>>ug <- UG(srcUrl = ugUrl, parser = file.path(pkgpath,
>>"scripts", "gbUGParser"), baseFile = "geneNMap",
>>organism = "Homo sapiens", built = "N/A", fromWeb = FALSE)
>>parseData(ug)
>
>The geneNMap file has the entries:
>
>32468_f_at	D90278;M16652
>32469_at	L00693
>NM_001815	NM_001815
>BF897514	BF897514
>38912_at	D90042
>BC028014	BC028014
>D90042	D90042
>
>I get out:
>		[,1]		[,2]
>32468_f_at "32468_f_at" "1084;63036"
>32469_at   "32469_at"   "1084"      
>38912_at   "38912_at"   "10"        
>BF897514   "BF897514"   "1084"      
>D90042     "D90042"     "10"        
>
>
>Thanks a lot for your help in advance..
>
>Regards,
>
>Benjamin
>
>
>-- 
>Benjamin Otto
>Universitaetsklinikum Eppendorf Hamburg
>Institut fuer Klinische Chemie
>Martinistrasse 52
>20246 Hamburg
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives: 
http://news.gmane.org/gmane.science.biology.informatics.conductor

Jianhua Zhang
Department of Medical Oncology
Dana-Farber Cancer Institute
44 Binney Street
Boston, MA 02115-6084



More information about the Bioconductor mailing list