[BioC] remove NA from named character vector

Fri Jul 22 13:03:39 CEST 2011

Hi List

This is likely a trivial problem but it's annoying me. I am mapping from Bos taurus ensembl ids to symbols. I can do this in biomaRt but use of the org.Bt.eg.db package means I'm not tied to an internet connection. 

A toy example:

library(org.Bt.eg.db)
ens <- c('ENSBTAG00000004218', 'ENSBTAG00000004270', 'ENSBTAG00000004578', 'ENSBTAG00000004608')
egs <- unlist(mget(ens, revmap(org.Bt.egENSEMBL), ifnotfound=NA))

egs

ENSBTAG00000004218 ENSBTAG00000004270 ENSBTAG00000004578 ENSBTAG00000004608 
          "617660"           "407106"                 NA        "100138951" 

# a named character vector with one NA

#now get symbols
syms <- unlist(mget(egs, org.Bt.egSYMBOL, ifnotfound=NA))

#throws and error - fair enough - need to drop the NA

which(egs == NA)

#gives named integer(0) - hmm
class(egs)
#gives [1] "character" - so I'm quite confused now.

NA %in% egs
#gives [1] TRUE

How do I identify which entries in 'egs' are NA so I can remove them? It's trivial here but the dataset I'm working with is in the thousands.

Thanks

iain

> sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.utf8       LC_NUMERIC=C             
 [3] LC_TIME=en_GB.utf8        LC_COLLATE=en_GB.utf8    
 [5] LC_MONETARY=C             LC_MESSAGES=en_GB.utf8   
 [7] LC_PAPER=en_GB.utf8       LC_NAME=C                
 [9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C      

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] org.Bt.eg.db_2.5.0   RSQLite_0.9-4        DBI_0.2-5           
[4] AnnotationDbi_1.14.1 Biobase_2.10.0