[R] A query about na.omit

Wed Apr 1 21:19:04 CEST 2009

On Wed, 2009-04-01 at 16:49 +0100, Jose Iparraguirre D'Elia wrote:
> Dear all,
>  
> Say I have the following dataset:
>  
> > DF
>         x     y     z
> [1]   1     1     1
> [2]   2     2     2
> [3]   3     3    NA
> [4]   4   NA   4
> [5]  NA  5     5
>  
> And I want to omit all the rows which have NA, but only in columns X and Y, so that I get:
>  
>  x  y  z
> 1  1  1
> 2  2  2
> 3  3  NA
>  
> If I use na.omit(DF), I would delete the row for which z=NA, obtaining thus
>  
> x y z
> 1 1 1
> 2 2 2
>  
> But this is not what I want, of course. 
> If I use na.omit(DF[,1:2]), then I obtain
>  
> x y 
> 1 1
> 2 2
> 3 3
>  
> which is OK for x and y columns, but I wouldn't get the corresponding values for z (ie 1 2 NA)
>  
> Any suggestions about how to obtain the desired results efficiently (the actual dataset has millions of records and almost 50 columns, and I would apply the procedure on 12 of these columns)?
>  
> Sincerely,
>  
> Jose Luis 
>  
> Jose Luis Iparraguirre
> Senior Research Economist 
> Economic Research Institute of Northern Ireland
>  

Hi Jose Luis,

I think this script is sufficient for your problem:

tab<-matrix(c(1,1,1,2,2,2,3,3,NA,4,NA,4,NA,5,5),ncol=3,byrow=T)
tab[!is.na(tab[,1])&!is.na(tab[,2]),]

-- 
Bernardo Rangel Tura, M.D,MPH,Ph.D
National Institute of Cardiology
Brazil