[BioC] remove NA from named character vector
iaingallagher at btopenworld.com
Fri Jul 22 13:03:39 CEST 2011
This is likely a trivial problem but it's annoying me. I am mapping from Bos taurus ensembl ids to symbols. I can do this in biomaRt but use of the org.Bt.eg.db package means I'm not tied to an internet connection.
A toy example:
ens <- c('ENSBTAG00000004218', 'ENSBTAG00000004270', 'ENSBTAG00000004578', 'ENSBTAG00000004608')
egs <- unlist(mget(ens, revmap(org.Bt.egENSEMBL), ifnotfound=NA))
ENSBTAG00000004218 ENSBTAG00000004270 ENSBTAG00000004578 ENSBTAG00000004608
"617660" "407106" NA "100138951"
# a named character vector with one NA
#now get symbols
syms <- unlist(mget(egs, org.Bt.egSYMBOL, ifnotfound=NA))
#throws and error - fair enough - need to drop the NA
which(egs == NA)
#gives named integer(0) - hmm
#gives  "character" - so I'm quite confused now.
NA %in% egs
#gives  TRUE
How do I identify which entries in 'egs' are NA so I can remove them? It's trivial here but the dataset I'm working with is in the thousands.
R version 2.13.1 (2011-07-08)
Platform: x86_64-pc-linux-gnu (64-bit)
 LC_CTYPE=en_GB.utf8 LC_NUMERIC=C
 LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8
 LC_MONETARY=C LC_MESSAGES=en_GB.utf8
 LC_PAPER=en_GB.utf8 LC_NAME=C
 LC_ADDRESS=C LC_TELEPHONE=C
 LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
attached base packages:
 stats graphics grDevices utils datasets methods base
other attached packages:
 org.Bt.eg.db_2.5.0 RSQLite_0.9-4 DBI_0.2-5
 AnnotationDbi_1.14.1 Biobase_2.10.0
More information about the Bioconductor