[BioC] athPkgBuilder data source :missing probesets

Nianhua Li nli at fhcrc.org
Thu Aug 10 19:25:20 CEST 2006


Dear Tine and Björn,

Thanks a lot for your detailed replies. I really appreciate them. I 
would like to summarize them to make sure we are on the same page:

Now I understand that we should use AGI locus as gene identifier and it 
can be missing for some probesets. It also seems EntrezGene ID is 
unnecessary. I was actually more interested in the *source*. Whether 
should we use *Affymetrix's annotation* 
(https://www.affymetrix.com/support/technical/byproduct.affx?product=arab) 
or *TAIR's* 
(ftp://ftp.arabidopsis.org/home/tair/Microarrays/Affymetrix/affy_ATH1_array_elements-2006-07-14.txt) 
for probeset-to-gene mapping.  You both prefer TAIR's, don't you? The 
current implementation (athPkgBuilder) is based on Affymetrix's.

Thanks for the PubMed source 
(ftp://ftp.arabidopsis.org/home/tair/User_Requests/LocusPublished.08012006.txt). 
Should I make it the default in athPkgBuilder then?

It is fairly easy to obtain KEGG annotation. File 
ftp://ftp.genome.jp/pub/kegg/genomes/ath/ath_tair.list  maps AGI locus 
to KEGG Gene ID mapping.  If you look at the file, the two identifiers 
always have the same value.  And then file 
ftp://ftp.genome.jp/pub/kegg/pathways/ath/ath_gene_map.tab maps KEGG 
Gene ID to KEGG pathway ID. Finally file 
ftp://ftp.genome.jp/pub/kegg/pathways/map_title.tab maps KEGG pathway ID 
to pathway title. Another detail is that KEGG has two "genome code" for 
Arabidopsis: ath and eath. "ath" contains mappings between pathway and 
CDS (real genes), whereas "eath" maps pathway with ESTs. For example, 
"eath00051" and "ath00051" shows the same pathway graph, but links to 
CDS and EST respectively:
  http://www.genome.jp/dbget-bin/show_pathway?eath00051
  http://www.genome.jp/dbget-bin/show_pathway?ath00051
Should we use "ath" or "eath"?

Also it seems the gene description (ath1121501GENENAME) part should keep 
the current implementation (base on 
ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR_sequenced_genes ).

thanks again

nianhua



More information about the Bioconductor mailing list