[BioC] athPkgBuilder data source :missing probesets

Tine Casneuf tineke.casneuf at ebi.ac.uk
Thu Aug 10 22:03:20 CEST 2006


Hi guys,

I agree with Thomas. They have taken over from TIGR last year and are 
taking care of the annotation (that's why the previous releases of the 
genome were called TIGRx and the last one is TAIR6).

About the ath and eath in kegg: we map probesets to locusIDs and not to 
the transcripts themselves (otherwise you could distinguish between 
spice variants f.e.), so I guess the 'ath' will be allright too.

about the PMIDs: I mailed the lady back who told me about the file 
asking if they are going to keep that file where it is 
(ftp://ftp.arabidopsis.org/home/tair/User_Requests/LocusPublished.08012006.txt). 
Maybe they made it available because I asked for it (since it is in the 
user request directory).

the ath1121501GENENAME works fine for me!


best wishes and thank you,

tine
 

Thomas Girke wrote:

>Nianhua,
>
>I suggest to use the probeset-to-gene mappings from TAIR, since they
>are in charge of the annotation of this genome. This way one can be sure the 
>probeset-to-gene mappings align with new annotation releases of this
>genome.
>
>Also, I would consider to include the gene/locus-to-GO mappings from
>TAIR. This data set is downloadable directly from GO.org:
>
>http://geneontology.org/GO.current.annotations.shtml
>http://www.geneontology.org/cgi-bin/downloadGOGA.pl/gene_association.tair.gz
>
>Thanks for taking care of this.
>
>Thomas
>
>
>On Thu 08/10/06 10:25, Nianhua Li wrote:
>  
>
>>Dear Tine and Bj?rn,
>>
>>Thanks a lot for your detailed replies. I really appreciate them. I 
>>would like to summarize them to make sure we are on the same page:
>>
>>Now I understand that we should use AGI locus as gene identifier and it 
>>can be missing for some probesets. It also seems EntrezGene ID is 
>>unnecessary. I was actually more interested in the *source*. Whether 
>>should we use *Affymetrix's annotation* 
>>(https://www.affymetrix.com/support/technical/byproduct.affx?product=arab) 
>>or *TAIR's* 
>>(ftp://ftp.arabidopsis.org/home/tair/Microarrays/Affymetrix/affy_ATH1_array_elements-2006-07-14.txt) 
>>for probeset-to-gene mapping.  You both prefer TAIR's, don't you? The 
>>current implementation (athPkgBuilder) is based on Affymetrix's.
>>
>>Thanks for the PubMed source 
>>(ftp://ftp.arabidopsis.org/home/tair/User_Requests/LocusPublished.08012006.txt). 
>>Should I make it the default in athPkgBuilder then?
>>
>>It is fairly easy to obtain KEGG annotation. File 
>>ftp://ftp.genome.jp/pub/kegg/genomes/ath/ath_tair.list  maps AGI locus 
>>to KEGG Gene ID mapping.  If you look at the file, the two identifiers 
>>always have the same value.  And then file 
>>ftp://ftp.genome.jp/pub/kegg/pathways/ath/ath_gene_map.tab maps KEGG 
>>Gene ID to KEGG pathway ID. Finally file 
>>ftp://ftp.genome.jp/pub/kegg/pathways/map_title.tab maps KEGG pathway ID 
>>to pathway title. Another detail is that KEGG has two "genome code" for 
>>Arabidopsis: ath and eath. "ath" contains mappings between pathway and 
>>CDS (real genes), whereas "eath" maps pathway with ESTs. For example, 
>>"eath00051" and "ath00051" shows the same pathway graph, but links to 
>>CDS and EST respectively:
>>  http://www.genome.jp/dbget-bin/show_pathway?eath00051
>>  http://www.genome.jp/dbget-bin/show_pathway?ath00051
>>Should we use "ath" or "eath"?
>>
>>Also it seems the gene description (ath1121501GENENAME) part should keep 
>>the current implementation (base on 
>>ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR_sequenced_genes ).
>>
>>thanks again
>>
>>nianhua
>>
>>_______________________________________________
>>Bioconductor mailing list
>>Bioconductor at stat.math.ethz.ch
>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>    
>>
>
>  
>



More information about the Bioconductor mailing list