[BioC] Error while mapping SYMBOLS to ENTREZID

Martin Morgan mtmorgan at fhcrc.org
Fri Aug 15 06:45:13 CEST 2014


On 08/14/2014 07:58 PM, Atul Kakrana wrote:
> Hi All,
>
> I am getting a strange error converting Gene Symbols to Entrez ID. Here is my code:
>
>  >testData = read.delim("IL_CellVar.txt",head=T,row.names = 2)
>  > testData[1:5,1:3]
>                 ClustID Genes.Symbol      ChrLoc
> NM_001034168.1       4         Ank2 chrNA:-1--1
> NM_013795.4          4        Atp5l chrNA:-1--1
> NM_018770            4       Igsf4a chrNA:-1--1
> NM_146150.2          4         Nrd1 chrNA:-1--1
> NM_134065.3          4        Epdr1 chrNA:-1--1
>
>  > clustNum = 5
>  > filteredClust = testData[testData$ClustID == clustNum,]
>  > any(is.na(filteredClust$Genes.Symbol))
>
> [1] FALSE
>
>
>  > selectedEntrezIds <- unlist(mget(filteredClust$Genes.Symbol,org.Mm.egSYMBOL2EG))
> Error in unlist(mget(filteredClust$Genes.Symbol, org.Mm.egSYMBOL2EG)) :
>    error in evaluating the argument 'x' in selecting a method for function
> 'unlist': Error in .checkKeysAreWellFormed(keys) :
>    keys must be supplied in a character vector with no NAs
>
> Another approach fails too:
>
>  > selectedEntrezIds = select(org.Mm.eg.db,filteredClust$Genes.Symbol, "ENTREZID")
> Error in .select(x, keys, columns, keytype = extraArgs[["kt"]], jointype =
> jointype) :
>    'keys' must be a character vector
>
> I am not sure why I am getting this error as the master file from which gene
> symbols were extracted for testData gives no problem while converting to
> EntrezID. Would apprecite help on this.

likely your vector of symbols is a factor rather than character (see the 
"stringsAsFactors" argument on the help page ?read.delim)

 > select(org.Mm.eg.db, "Ank2", "ENTREZID", "SYMBOL")
   SYMBOL ENTREZID
1   Ank2   109676
 > select(org.Mm.eg.db, factor("Ank2"), "ENTREZID", "SYMBOL")
Error in .testForValidKeys(x, keys, keytype) :
   'keys' must be a character vector


>
> Thanks
>
> AK
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list