[BioC] Incomplete EntrezID annotations for the Mouse 430 v2.0 probe-set

Martin Morgan mtmorgan at fhcrc.org
Wed Nov 3 03:40:22 CET 2010


On 11/02/2010 11:20 AM, ANJAN PURKAYASTHA wrote:
> Hi Martin,
> Session Info:
> R version 2.11.1 (2010-05-31)
> i386-apple-darwin9.8.0
> 
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base    
> 
> other attached packages:
>  [1] affy_1.26.1          GOstats_2.14.0       graph_1.28.0        
> Category_2.14.0      mouse4302.db_2.4.1   org.Mm.eg.db_2.4.1  
> RSQLite_0.9-2      
>  [8] DBI_0.2-5            AnnotationDbi_1.10.2 Biobase_2.8.0      
> 
> loaded via a namespace (and not attached):
>  [1] affyio_1.16.0         annotate_1.26.1       genefilter_1.30.0    
> GO.db_2.4.1           GSEABase_1.10.0       preprocessCore_1.10.0
>  [7] RBGL_1.26.0           splines_2.11.1        survival_2.35-8      
> tools_2.11.1          XML_3.1-1             xtable_1.5-6      
> 
> 
> Commands used to create the mapping:
> Library(mouse4302.db)
> id <- rownames(allMtb.rma.data.frame)
> map <- mouse4302ENTREZID
> probe_entrezid <- unlist(mget(id, map))
> p <- as.data.frame(probe_entrezid)
> p now has the probeID_entrezID mappings

With R-2-11 I see

> mouse4302()
[...snip...]
mouse4302ENTREZID has 37316 mapped keys (of 45101 keys)
[...snip...]
Date for NCBI data: 2010-Mar1

The current version of R / Bioconductor is R-2-12, where there are 37413
mapped probes from NCBI data of 2010-Sep7. Using biomaRt I get

> library(biomaRt)
> mart = useMart("ensembl", "mmusculus_gene_ensembl")
> attrs = listAttributes(mart)
> attrs[grep("(Entrez|Affy mouse)", attrs[[2]]),]
               name      description
47       entrezgene    EntrezGene ID
95  affy_mouse430_2  Affy mouse430 2
96 affy_mouse430a_2 Affy mouse430a 2
> filts = listFilters(mart)
> filts[grep("(Entrez|Affy mouse)", filts[[2]]),]
                name                              description
52   with_entrezgene                    with EntrezGene ID(s)
84        entrezgene        EntrezGene ID(s) [e.g. 100287163]
121  affy_mouse430_2  Affy mouse430 2 ID(s) [e.g. 1426088_at]
122 affy_mouse430a_2 Affy mouse430a 2 ID(s) [e.g. 1426088_at]
> res = getBM(c("affy_mouse430_2","entrezgene"), "with_entrezgene",
              TRUE, mart)
> head(res)
  affy_mouse430_2 entrezgene
1                     338371
2                     238944
3                     208431
4      1430582_at     268281
5      1458594_at     268281
6    1455882_x_at     319922
> head(table(table(res[[1]])))

    1     2     3     4     5     6
24627  1746   374    96    62    34

which tells me there are 24627 uniquely mapping probes, and some more
that could be retrieved with some work (I haven't checked my biomaRt
work very carefully here, so could have made mistakes, and I don't know
biomaRt well enough to get the provenance of the probes I have
identified, unlike with mouse4302.db where ?mouse4302ENTREZID is
helpful). I could remap the probes using chromosome coordinates from the
mouse4302 package and BSgenome / Biostrings, and then use org.Mm.eg.db
to map coordinates to genes, too. So I think the best you can do easily
are the ~37,000 probes that are mapped.

Martin

> 
> Thanks,
> Anjan
> 
> 
> On Tue, Nov 2, 2010 at 2:16 PM, Martin Morgan <mtmorgan at fhcrc.org
> <mailto:mtmorgan at fhcrc.org>> wrote:
> 
>     On 11/02/2010 11:14 AM, ANJAN PURKAYASTHA wrote:
>     > Hi,
>     > I have run into the following problem. I created a
>     probeID-EntrezID mapping
>     > for the Affy mouse array from the cognate annotation file
>     Mouse4302.db.
>     > Unfortunately about 10000 genes do not have corresponding EntrezID.
>     > Many of these are genes with known functions. If I cannot map a
>     EntrezID to
>     > these then I cannot retrieve GO annotations and consequently I
>     cannot do a
>     > Gene Set Enrichment analysis using GOstats.
>     > Does anyone have an update annotation file?
> 
>     Hi Anjan
> 
>     What is your sessionInfo() (else how could we know what an 'updated'
>     annotation file is?) and how did you preform the mapping (short,
>     hopefully reproducible, code)?
> 
>     Martin
> 
>     > Many thanks in advance,
>     > Anjan
>     >
> 
> 
>     --
>     Computational Biology
>     Fred Hutchinson Cancer Research Center
>     1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
> 
>     Location: M1-B861
>     Telephone: 206 667-2793
> 
> 
> 
> 
> -- 
> ===================================
> anjan purkayastha, phd.
> research associate
> fas center for systems biology,
> harvard university
> 52 oxford street
> cambridge ma 02138
> phone-703.740.6939
> ===================================


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list