[BioC] Gene Database Files (eg gene2accession) complete?

Sean Davis sdavis2 at mail.nih.gov
Tue Feb 20 16:48:15 CET 2007


On Tuesday 20 February 2007 09:48, Benjamin Otto wrote:
> Dear biocondutors,
>
> Obviously the database files accessible at the refseq, gene or locuslink
> ftp sites do not contain all ids which can be uniquely identified via the
> ncbi web interface. Whrere can I find database files containing the rest?
>
> Query the RefSeq identifier "NM_032722" via NCBI in the gene database and
> it will return exactly one hit:
>
> C1orf170 	Links
> Official Symbol: C1orf170 and Name: chromosome 1 open reading frame 170
> [Homo sapiens]
> Other Aliases: MGC13275, RP11-54O7.8
> Other Designations: hypothetical protein LOC84808
> Chromosome: 1; Location: 1p36.33
> GeneID: 84808
>
> So I supposed that I should be able to track this gene in the current
> gene2accession, gene2refseq (from the gene ftp site) or locuslink LL_tmpl
> file. Neither contains the identifier. Same is true for the RefSeq
> RefSeq-release21.catalog and accession2geneid files.
>
> Now a closer look at the hit reveals that the sequence has been surpressed.
> Has anybody an idea whether there is a database file which SHOULD contain
> this identifier (although it's surpressed)? My current problem is, that
> from about 26000 accessions I can only find around 13000 in the above
> mentioned files.

Do you have access to the sequences?  If you do, you may want to simply blast 
your sequences (if you have them) against RefSeq to get the most up-to-date 
annotation.

Sean



More information about the Bioconductor mailing list