[BioC] GSEABase how to map gene symbols to mouse EntrezId or Affy

Martin Morgan mtmorgan at fhcrc.org
Thu May 15 20:47:40 CEST 2008


"Vladimir Morozov" <vmorozov at als.net> writes:

> Martin,
>
> You are right that disagreement beween human and mouse symblos is the
> problem. But you still should get some mapping if translate symbols into
> capwords
>> sum(!is.na(mget(gss[[1]]@geneIds,org.Mm.egSYMBOL2EG,ifnotfound=NA)))
> [1] 0

Always use accessors, geneIds(gss[[1]]), ...

> sum(!is.na(mget(capwords(tolower(gss[[1]]@geneIds)),org.Mm.egSYMBOL2EG,i
> fnotfound=NA)))
> [1] 46

be nice to your helpers with complete examples, I guess capwords is

> capwords <- function(x) sub("^([a-z])", "\\U\\1", x, perl=TRUE)

then

> cids <- capwords(tolower(geneIds(gss[[1]])))
> egids <- mget(cids, org.Mm.egSYMBOL2EG, ifnotfound=NA)
> egids <- egids[!is.na(egids)]


> Let's say I will figure out some mapping using ortholog or alias names.
> Will I screw the GeneSet data structure by
> gss2 <- lapply(gss,function(x){x at geneIds <-
> my.mapping(x at geneIds);x at geneIdType@type <- 'EntrezIdentifier'})

More on this below...  mapIdentifiers provides a convenient side door
in the form of

> showMethods('mapIdentifiers', class='environment')
Function: mapIdentifiers (package GSEABase)
what="GeneColorSet", to="GeneIdentifierType", from="environment"
what="GeneSet", to="GeneIdentifierType", from="environment"

which is to say that if you have a custom mapping you can represent it
as an environment with keys equal to the identifiers you're mapping
from and values the identifiers you're mapping to, e.g.,

> names(egids) <- toupper(names(egids))
> env <- l2e(egids)
> mapIdentifiers(gss[[1]], EntrezIdentifier(), env)

probably you want to inject information about the identifiers you are
mapping to, e.g., that they are mouse, using as the second argument
EntrezIdentifier('org.Mm.eg.db')

There doesn't seem to be a method defined for gene set collections (an
oversight), but you can

> GeneSetCollection(lapply(gss, mapIdentifiers, EntrezIdentifier(), env))

back to...

> gss2 <- lapply(gss,function(x){x at geneIds <-
> my.mapping(x at geneIds);x at geneIdType@type <- 'EntrezIdentifier'})

There are a bunch of ways through this, but I would avoid using direct
slot access. One possibility would be

> my.mapping <- force
> gss2 <- GeneSetCollection(lapply(gss, function(x) {
>     GeneSet(EntrezIdentifier('org.Mm.eg.db'),
>             geneIds=my.mapping(geneIds(x)),
>             setName=setName(x))
> }))

Martin

> ?
>
>
>
> Vladimir Morozov 
>
>
>
> -----Original Message-----
> From: Martin Morgan [mailto:mtmorgan at fhcrc.org] 
> Sent: Thursday, May 15, 2008 12:56 PM
> To: Vladimir Morozov
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] GSEABase how to map gene symbols to mouse EntrezId
> or Affy
>
> Hi Vladimir --
>
> "Vladimir Morozov" <vmorozov at als.net> writes:
>
>> Hi
>>  
>> Any suggestions how to map  gene symbols to mouse EntrezId(preffered) 
>> or Affy.
>> mapping to Entez apparently is not supported by GSEABase
>>> mapIdentifiers(gss,EntrezIdentifier())
>> Error in .mapIdentifiers_isMappable(from, to) : 
>>   unable to map from 'Symbol' to 'EntrezId'
>>     neither GeneIdentifierType has annotation
>
> mapIdentifiers needs to know where to look for the map. I guess the way
> you created gss means that it doesn't know about the organism you're
> using, and EntrezIdentifier() also doesn't. What you want is
>
>> mapIdentifiers(gss, EntrezIdentifier("org.Mm.eg.db"))
> GeneSetCollection
>   names: chr5q23, chr16q24 (2 total)
>   unique identifiers:  (0 total)
>   types in collection:
>     geneIdType: EntrezIdentifier (1 total)
>     collectionType: BroadCollection (1 total)
>
> Here I'm using (and I guess you are too) the gss that comes from
> example(getBroadSets). These are human genes, and have no corresponding
> mouse equivalents (see below)...
>
>> Error in GeneSetCollection(lapply(what, mapIdentifiers, to, ..., 
>> verbose = verbose)) :
>>   error in evaluating the argument 'object' in selecting a method for 
>> function 'GeneSetCollection'
>>  
>>  
>> Mapping to Affys works for human, but not for mouse
>>> mapIdentifiers(gss, AnnotationIdentifier("hgu95av2.db"))
>> GeneSetCollection
>>   names: chr5q23, chr16q24 (2 total)
>>   unique identifiers: 35089_at, 35090_g_at, ..., 35807_at (79 total)
>>   types in collection:
>>     geneIdType: AnnotationIdentifier (1 total)
>>     collectionType: BroadCollection (1 total)
>>> mapIdentifiers(gss, AnnotationIdentifier("mouse4302.db"))
>> GeneSetCollection
>>   names: chr5q23, chr16q24 (2 total)
>>   unique identifiers:  (0 total)
>>   types in collection:
>>     geneIdType: AnnotationIdentifier (1 total)
>>     collectionType: BroadCollection (1 total)
>
> This is becaus the identifiers are not in mouse
>
>> ids <- unique(unlist(geneIds(gss)))
>> egs <- mget(ids, revmap(mouse4302ENTREZID), ifnotfound=NA) 
>> sum(!sapply(egs, is.na))
> [1] 0
>
>>> 
>>  
>>  
>> Thanks
>>  
>>
>> Vladimir Morozov
>>
>>
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> --
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center 1100
> Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M2 B169
> Phone: (206) 667-2793
>

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793



More information about the Bioconductor mailing list