[BioC] affymetrix rat 2.0 gene annotation issue

Marc Carlson mcarlson at fhcrc.org
Thu Jul 24 19:06:53 CEST 2014


Hi Dave,

This is of interest to me since these Affy files are what we use to make 
annotation packages.  Here we are faced every release with the reality 
that our packages can only be as good as the original files.  So to the 
extent that you successfully lobby Affymetrix to improve them, we will 
also be able to also improve our probe and chip packages based on that 
platform.  So thanks for caring enough to talk to Affy about it!  The 
entire community can benefit if you get them to improve things.

   Marc



On 07/24/2014 08:04 AM, David wrote:
> Dear users, in case it is of help to anyone, just for the records or if someone can share some comments...
>
> I have searched in the list for similar issue but have not found anything related.
>
> We have just performed a Rat 2.0 Gene experiment and have found that current Affymetrix annotation files (version na34.rn5) have errors in the chromosome, strand, start and end columns. We got in touch with support and told us that there was an error in the annotation protocol and they will review it.
>
> In addition to this, we have also detected that different transcript_cluster_IDs interrogate same genes/sequences, with apparently same probesets, so there is quite a degree of redundancy. On this, support has given us some explanations. Here just the start of the email:
>
> "The short answer is that the re-use of probes with the
> same sequence in different transcript clusters is a result of how the Gene
> array design handles genes that have duplicate copies in different parts of the
> genome, or genes that are part of a widespread gene family with regions of
> near-identity. The issues regarding the gene assignments of these probes is a
> side-effect of drift in the transcript record between array design and
> annotation time. Read on for the
> technical nitty-gritty.[...]
>
> "
>
> Finally, we have found that the Rat Gene 2.0 transcript annotation file , which has some 36000 transcript clusters i ntotal, has just below 30000 transcripts defined as "main", of which 11000 present no annotation whatsoever. Which to me is just too much lack of annotation.
>
>
>
> HTH
>
> Dave
>   		 	   		
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list