[BioC] Duplicated probesets for the same gene

Tue Apr 25 15:55:52 CEST 2006

Thanks to Sean, Bin, David and Jim, I have now a much better 
understanding of the issues.

I am going to try the re-mapped cdfs.

Sincerely,

Saroj

James W. MacDonald wrote:
> As Sean mentioned, there are possibly many reasons for multiple 
> probesets. First, they may be intended to interrogate splice variants. 
> Second, these probesets are based on UniGene build 95, which is very old 
> (the current build is #190), and many ESTs or Riken genes may have been 
> mapped in the intervening period to genes that already existed on the chip.
>
> In addition, many of the probesets contain probes that are now known to 
> either interrogate unrelated sequences or not map to any known sequence.
>
> You can now download the re-mapped cdfs that are provided by the 
> Molecular and Behavioral Neuroscience Institute (MBNI) at the University 
> of Michigan directly from BioC. These cdfs contain probesets that have 
> been re-mapped based on the current UniGene, Ensembl, Entrez Gene, 
> RefSeq, or Tigr annotations. The benefits of using these cdfs are 
> twofold. First, there is only one probeset per gene (may not be true of 
> RefSeq - I think there may be some redundancy there, but am not sure). 
> Second, any probe that interrogates multiple transcripts or no longer 
> maps to the genome have been removed, so theoretically you should get 
> better data.
>
> The major downside (for me at least) is the loss of the easy preprocess 
> ==> analyze ==> annotate pipeline provided by the affy, limma, and 
> annaffy packages. However, Steffen Durinck has kindly modified his 
> biomaRt code to allow for an alternate affy ==> limma ==> biomaRt ==> 
> annotate analysis pipeline. Anybody interested in such things can take a 
> look at the prettyOutput vignette in biomaRt.
>
> Best,
>
> Jim
>
>
>
>