[BioC] remove NA from named character vector

axel.klenk at actelion.com axel.klenk at actelion.com
Fri Jul 22 13:11:19 CEST 2011


Hi Iain,

you cannot test for NA using the == operator, you'll have to use is.na(), 
eg.

which(is.na(egs))

or, if you just want to get rid of them:

na.omit(egs)

HTH,

 - axel


Axel Klenk
Research Informatician
Actelion Pharmaceuticals Ltd / Gewerbestrasse 16 / CH-4123 Allschwil / 
Switzerland




From:
Iain Gallagher <iaingallagher at btopenworld.com>
To:
bioconductor <bioconductor at stat.math.ethz.ch>
Date:
22.07.2011 13:03
Subject:
[BioC] remove NA from named character vector
Sent by:
bioconductor-bounces at r-project.org



Hi List

This is likely a trivial problem but it's annoying me. I am mapping from 
Bos taurus ensembl ids to symbols. I can do this in biomaRt but use of the 
org.Bt.eg.db package means I'm not tied to an internet connection. 

A toy example:

library(org.Bt.eg.db)
ens <- c('ENSBTAG00000004218', 'ENSBTAG00000004270', 'ENSBTAG00000004578', 
'ENSBTAG00000004608')
egs <- unlist(mget(ens, revmap(org.Bt.egENSEMBL), ifnotfound=NA))

egs

ENSBTAG00000004218 ENSBTAG00000004270 ENSBTAG00000004578 
ENSBTAG00000004608 
          "617660"           "407106"                 NA "100138951" 

# a named character vector with one NA

#now get symbols
syms <- unlist(mget(egs, org.Bt.egSYMBOL, ifnotfound=NA))

#throws and error - fair enough - need to drop the NA

which(egs == NA)

#gives named integer(0) - hmm
class(egs)
#gives [1] "character" - so I'm quite confused now.

NA %in% egs
#gives [1] TRUE


How do I identify which entries in 'egs' are NA so I can remove them? It's 
trivial here but the dataset I'm working with is in the thousands.

Thanks

iain

> sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.utf8       LC_NUMERIC=C 
 [3] LC_TIME=en_GB.utf8        LC_COLLATE=en_GB.utf8 
 [5] LC_MONETARY=C             LC_MESSAGES=en_GB.utf8 
 [7] LC_PAPER=en_GB.utf8       LC_NAME=C 
 [9] LC_ADDRESS=C              LC_TELEPHONE=C 
[11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C 

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base 

other attached packages:
[1] org.Bt.eg.db_2.5.0   RSQLite_0.9-4        DBI_0.2-5 
[4] AnnotationDbi_1.14.1 Biobase_2.10.0 


_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: 
http://news.gmane.org/gmane.science.biology.informatics.conductor




The information of this email and in any file transmitted with it is strictly confidential and may be legally privileged.
It is intended solely for the addressee. If you are not the intended recipient, any copying, distribution or any other use of this email is prohibited and may be unlawful. In such case, you should please notify the sender immediately and destroy this email.
The content of this email is not legally binding unless confirmed by letter.
Any views expressed in this message are those of the individual sender, except where the message states otherwise and the sender is authorised to state them to be the views of the sender's company. For further information about Actelion please see our website at http://www.actelion.com



More information about the Bioconductor mailing list