[BioC] Filter out pombe probeset from cerevisiae probesets for yeast2 Affymetrix chip

Marc Carlson mcarlson at fhcrc.org
Mon Dec 3 19:22:27 CET 2007


Guiyuan Lei wrote:
> Hi Jim,
>
> Thanks for suggestion. In order to get the gene names/symbols for
> cerevisiae probesets as much as possible, I donwloaded Yeast2
> annotation from Affymetrix
> http://www.affymetrix.com/Auth/analysis/downloads/na24/ivt/Yeast_2.na24.annot.csv.zip
>
> Firstly, I found that Bioconductor have got more cerevisiae probesets
> named than what Affymetrix has. In Yeast2GENENAME (from Bioconductor),
> 4640 probesets out of 5900 probesets (after filter out 5028 pombe
> probesets which are in mask file s_cerevisiae.msk) have gene names
> while there are only 4557 probesets out of 5900 probesets (also after
> filter out 5028 pombe probesets which are labeled as "pombe" specie in
> Yeast_2.na24.annot.csv ) have gene symbols in Yeast_2.na24.annot.csv.
> The Yeast_2.na24.annot.csv I used is the latest file which was updated
> in November 2007. How could the Affymetrix have less information than
> third party (like Bioconductor)?
>
> Secondly, I found that the s_pombe.zip file from the following Affy
> web does NOT consist with its own annotation file
> (Yeast_2.na24.annot.csv mentioned above)
> http://www.affymetrix.com/Auth/support/downloads/mask_files/s_pombe.zip
> There are 5814 probesets are labeled as "cerevisiae" in
> Yeast_2.na24.annot.csv, so I suppose there are at least 5814 probesets
> in s_pombe.msk in order to mask cerevisiae probesets, but there are
> only 5749 probesets in s_pombe.msk. In addtion, the probeset
> "177968_at" is not in the whole 10928 probesets of Yeast2 chip but is
> in s_pombe.msk!!!
>
> Best regards,
> Guiyuan
>
>
> On Nov 29, 2007 4:21 PM, James W. MacDonald <jmacdon at med.umich.edu> wrote:
>   
>> Hi Guiyuan,
>>
>> Guiyuan Lei wrote:
>>     
>>> Hi Jim,
>>>
>>> Many thanks. I have checked the s_pombe.msk and s_cerevisiae.msk
>>> files, the overlap between pombe and cerevisiae are probesets which
>>> with prefix "AFFX" and "RPTR". One strange thing is that one probeset
>>> called "177968_at" is in s_pombe.msk but is NOT among the whole 10928
>>> probesets! So the overlap are 152 probesets.
>>>
>>> I got one more question, for the Yeast2GENENAME, many probesets only
>>> have ID, but no genename (is "NA"), is it possible to get gene
>>> name/symbol for all 10928 probesets?
>>>       
>> You might check either netaffx or biomaRt, but if there are no gene
>> names for certain probesets in the annotation package that usually
>> indicates that the probesets in question interrogate things that have
>> yet to be named (e.g., ESTs, inferred genes, etc).
>>     
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>   
Hi guys,

It is definitely possible for our annotations to have more information
about a particular field (like a gene symbol) than Affymetrix.  This is
because we don't just repackage our annotation information directly from
Affymetrix.  Instead we gather ID assignments ONLY from Affymetrix. 
These would be things like Entrez Gene IDs, Genebank Accessions etc. 
Our annotation pipeline collects one appropriate gene based ID for each
probeset from Affymetrix and this is meant to be basic information ONLY
about precisely what gene a particular probe is designed to measure. 
This minimal information is the only piece of data that we gather from
the Affymetrix annotation files.  Then we take that ID information to
other repositories like NCBI and use these to get information about
related stuff like gene symbols.  I can't tell you what exact process
Affymetrix uses to create their annotations but given the large number
of reasonable choices they could make, it seems pretty likely that they
do something that is slightly different from what we do. 


    Marc



More information about the Bioconductor mailing list