[BioC] Gene Database Files (eg gene2accession) complete?

Benjamin Otto b.otto at uke.uni-hamburg.de
Tue Feb 20 15:48:03 CET 2007


Dear biocondutors,

Obviously the database files accessible at the refseq, gene or locuslink ftp
sites do not contain all ids which can be uniquely identified via the ncbi
web interface. Whrere can I find database files containing the rest?

Query the RefSeq identifier "NM_032722" via NCBI in the gene database and it
will return exactly one hit:

C1orf170 	Links
Official Symbol: C1orf170 and Name: chromosome 1 open reading frame 170
[Homo sapiens]
Other Aliases: MGC13275, RP11-54O7.8
Other Designations: hypothetical protein LOC84808
Chromosome: 1; Location: 1p36.33
GeneID: 84808

So I supposed that I should be able to track this gene in the current
gene2accession, gene2refseq (from the gene ftp site) or locuslink LL_tmpl
file. Neither contains the identifier. Same is true for the RefSeq
RefSeq-release21.catalog and accession2geneid files.

Now a closer look at the hit reveals that the sequence has been surpressed.
Has anybody an idea whether there is a database file which SHOULD contain
this identifier (although it's surpressed)? My current problem is, that from
about 26000 accessions I can only find around 13000 in the above mentioned
files.

Regards

benjamin


-- 
Benjamin Otto
Universitaetsklinikum Eppendorf Hamburg
Institut fuer Klinische Chemie
Martinistrasse 52
20246 Hamburg



More information about the Bioconductor mailing list