[BioC] athPkgBuilder data source :missing probesets

Thomas Girke thomas.girke at ucr.edu
Thu Aug 10 19:57:31 CEST 2006


Nianhua,

I suggest to use the probeset-to-gene mappings from TAIR, since they
are in charge of the annotation of this genome. This way one can be sure the 
probeset-to-gene mappings align with new annotation releases of this
genome.

Also, I would consider to include the gene/locus-to-GO mappings from
TAIR. This data set is downloadable directly from GO.org:

http://geneontology.org/GO.current.annotations.shtml
http://www.geneontology.org/cgi-bin/downloadGOGA.pl/gene_association.tair.gz

Thanks for taking care of this.

Thomas


On Thu 08/10/06 10:25, Nianhua Li wrote:
> Dear Tine and Bj?rn,
> 
> Thanks a lot for your detailed replies. I really appreciate them. I 
> would like to summarize them to make sure we are on the same page:
> 
> Now I understand that we should use AGI locus as gene identifier and it 
> can be missing for some probesets. It also seems EntrezGene ID is 
> unnecessary. I was actually more interested in the *source*. Whether 
> should we use *Affymetrix's annotation* 
> (https://www.affymetrix.com/support/technical/byproduct.affx?product=arab) 
> or *TAIR's* 
> (ftp://ftp.arabidopsis.org/home/tair/Microarrays/Affymetrix/affy_ATH1_array_elements-2006-07-14.txt) 
> for probeset-to-gene mapping.  You both prefer TAIR's, don't you? The 
> current implementation (athPkgBuilder) is based on Affymetrix's.
> 
> Thanks for the PubMed source 
> (ftp://ftp.arabidopsis.org/home/tair/User_Requests/LocusPublished.08012006.txt). 
> Should I make it the default in athPkgBuilder then?
> 
> It is fairly easy to obtain KEGG annotation. File 
> ftp://ftp.genome.jp/pub/kegg/genomes/ath/ath_tair.list  maps AGI locus 
> to KEGG Gene ID mapping.  If you look at the file, the two identifiers 
> always have the same value.  And then file 
> ftp://ftp.genome.jp/pub/kegg/pathways/ath/ath_gene_map.tab maps KEGG 
> Gene ID to KEGG pathway ID. Finally file 
> ftp://ftp.genome.jp/pub/kegg/pathways/map_title.tab maps KEGG pathway ID 
> to pathway title. Another detail is that KEGG has two "genome code" for 
> Arabidopsis: ath and eath. "ath" contains mappings between pathway and 
> CDS (real genes), whereas "eath" maps pathway with ESTs. For example, 
> "eath00051" and "ath00051" shows the same pathway graph, but links to 
> CDS and EST respectively:
>   http://www.genome.jp/dbget-bin/show_pathway?eath00051
>   http://www.genome.jp/dbget-bin/show_pathway?ath00051
> Should we use "ath" or "eath"?
> 
> Also it seems the gene description (ath1121501GENENAME) part should keep 
> the current implementation (base on 
> ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR_sequenced_genes ).
> 
> thanks again
> 
> nianhua
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 

-- 
Thomas Girke, Ph.D.
1008 Noel T. Keen Hall
Center for Plant Cell Biology (CEPCEB)
University of California
Riverside, CA 92521

E-mail: thomas.girke at ucr.edu
Website: http://faculty.ucr.edu/~tgirke
Ph: 951-827-2469
Fax: 951-827-4437



More information about the Bioconductor mailing list