[BioC] Filter out pombe probeset from cerevisiae probesets for yeast2 Affymetrix chip

Guiyuan Lei guiyuanlei at gmail.com
Sun Dec 2 14:20:26 CET 2007


Hi Jim,

Thanks for suggestion. In order to get the gene names/symbols for
cerevisiae probesets as much as possible, I donwloaded Yeast2
annotation from Affymetrix
http://www.affymetrix.com/Auth/analysis/downloads/na24/ivt/Yeast_2.na24.annot.csv.zip

Firstly, I found that Bioconductor have got more cerevisiae probesets
named than what Affymetrix has. In Yeast2GENENAME (from Bioconductor),
4640 probesets out of 5900 probesets (after filter out 5028 pombe
probesets which are in mask file s_cerevisiae.msk) have gene names
while there are only 4557 probesets out of 5900 probesets (also after
filter out 5028 pombe probesets which are labeled as "pombe" specie in
Yeast_2.na24.annot.csv ) have gene symbols in Yeast_2.na24.annot.csv.
The Yeast_2.na24.annot.csv I used is the latest file which was updated
in November 2007. How could the Affymetrix have less information than
third party (like Bioconductor)?

Secondly, I found that the s_pombe.zip file from the following Affy
web does NOT consist with its own annotation file
(Yeast_2.na24.annot.csv mentioned above)
http://www.affymetrix.com/Auth/support/downloads/mask_files/s_pombe.zip
There are 5814 probesets are labeled as "cerevisiae" in
Yeast_2.na24.annot.csv, so I suppose there are at least 5814 probesets
in s_pombe.msk in order to mask cerevisiae probesets, but there are
only 5749 probesets in s_pombe.msk. In addtion, the probeset
"177968_at" is not in the whole 10928 probesets of Yeast2 chip but is
in s_pombe.msk!!!

Best regards,
Guiyuan


On Nov 29, 2007 4:21 PM, James W. MacDonald <jmacdon at med.umich.edu> wrote:
> Hi Guiyuan,
>
> Guiyuan Lei wrote:
> > Hi Jim,
> >
> > Many thanks. I have checked the s_pombe.msk and s_cerevisiae.msk
> > files, the overlap between pombe and cerevisiae are probesets which
> > with prefix "AFFX" and "RPTR". One strange thing is that one probeset
> > called "177968_at" is in s_pombe.msk but is NOT among the whole 10928
> > probesets! So the overlap are 152 probesets.
> >
> > I got one more question, for the Yeast2GENENAME, many probesets only
> > have ID, but no genename (is "NA"), is it possible to get gene
> > name/symbol for all 10928 probesets?
>
> You might check either netaffx or biomaRt, but if there are no gene
> names for certain probesets in the annotation package that usually
> indicates that the probesets in question interrogate things that have
> yet to be named (e.g., ESTs, inferred genes, etc).



More information about the Bioconductor mailing list