[BioC] affymetrix annotation

James W. MacDonald jmacdon at med.umich.edu
Tue Dec 21 16:29:29 CET 2010


Hi Arne,

Aha! I didn't know there were two IDs associated with this probeset 
(although I should have thought of that, or you know, looked it up...my 
bad).

So, here's the deal. When there are multiple IDs associated with a given 
probeset, it is difficult to say which one is correct. Or put another 
way, it is difficult to know exactly what the probeset is actually 
measuring.

In the past, what we did was take the first Entrez Gene ID as the 
correct one, and ignore the rest. This of course is suboptimal, as we 
are ignoring information. Recently, we decided to mask these 
questionable mappings, and only expose the mappings that have unique 
probeset --> Entrez Gene mappings, but also give a way for people to get 
all the questionable mappings as well.

If you want to see all the one-to-many mappings, you can use the 
infamous toggleProbes() function thusly:

 > suppressMessages(library(mouse4302.db))
 > a <- "1419590_at"
 > z <- toggleProbes(mouse4302ENTREZID, "all")
 > get(a,z)
[1] "100047700" "13094"
 > q <- toggleProbes(mouse4302SYMBOL, "all")
 > get(a,q)
[1] "LOC100047700" "Cyp2b9"

Best,

Jim



On 12/21/2010 6:15 AM, arne.mueller at novartis.com wrote:
> Hello,
>
> thanks Jim for your reply. The org.Mm.eg.db::org.Mm.egSYMBOL has the gene
> symbol for the probeset of interest, it's just mouse4302 that's missing
> it.
>
> I've just checked in an older netffx annotation files (older than 2007,
> 2009, and the current one from 2010). The probeset was annotated already
> in 2007, but it was always assigned to two Entrez Gene Ids: 100047700 ///
> 13094, the 2nd one seems to be the correct Id and the first one was
> removed from entrez gene Oct 28th 2010 (flagged "withdrawn"). Note that
> the first Id is the withdrawn one ... If the bioc annotation pipeline
> comes across ambiguous annotation, that it cannot resolve, does it ignore
> the annotation (since there is no unique gene Id available), and therefore
> sets the entrez gene id to NA?
>
> Note, the gene id in entrez gene was flagged "withdrawn" in October this
> year, and the date of the Entrez used in mouse4032 is September 2010 (i.e.
> the entry was still valid in entrez gene - and therefore ambiguous ...).
>
> I'm asking because I found I found this gene interesting form some reason
> and it was annotated as Cyp2b9 in our gene expressino software (using
> netaffx) but not annotated at all in one of my post-processing pipelines
> that uses the mouse4302.db from bioc.
>
>      kind regards,
>
>      Arne
>
> ps: the probeset I'm talking about is  1419590_at
>
>
>
>
> "James W. MacDonald"<jmacdon at med.umich.edu>
> 12/20/2010 09:58 PM
>
> To
> arne.mueller at novartis.com
> cc
> bioconductor at stat.math.ethz.ch
> Subject
> Re: [BioC] affymetrix annotation
>
>
>
>
>
>
> Hi Arne,
>
> On 12/20/2010 11:26 AM, arne.mueller at novartis.com wrote:
>> Hello,
>>
>> I was wondering where the mapping from affy probeset Ids to genes
>> (EntrezGene) is coming from (package mouse4302.db).
>> mouse4302.db::mouse4302_dbInfo lists all data sources (URLs) However,
>> what's the "original" link between an affy probeset and an EnterzGene
> Id,
>> is it Netaffx? My problem is that I've an "interesting" probeset that is
>> annotated with an EntrezGene in Netaffx but the mouse4302SYMBOL map
>> contains NA for this probeset Id.
>>
>> I'd be happy if someone could point me to some documentation how this
>> annotation/mapping is derived or let me know what I'm doing wrong ...?
>
> I doubt you are doing anything wrong. This may have changed a bit from
> the last time I talked to Marc, but in the past we needed a 'primary'
> mapping to start with, and chose the Affymetrix probeset ID -->  Entrez
> Gene mapping that you can get from the latest Affy annotation file.
>
> We assumed this to be the truth, and then did all the other mappings
> from Entrez Gene. The problem with this is that we are by definition
> using more dated information than you will typically get from Netaffx.
> So it is quite likely, given the fluid nature of this field, that some
> probesets will have different data in Netaffx than what you can get from
> the annotation packages.
>
> As an aside, does this probeset have an Entrez Gene ID in the annotation
> package?
>
> Best,
>
> Jim
>
>
>
>
>>
>>      thanks a lot for your help,
>>
>>      arne
>>                 [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 



More information about the Bioconductor mailing list