[R] A query about na.omit

Wed Apr 1 19:11:46 CEST 2009

First input the data frame:

> Lines <- "x     y     z
+    1     1     1
+    2     2     2
+    3     3    NA
+    4   NA   4
+   NA  5     5"
>
> DF <- read.table(textConnection(Lines), header = TRUE)

> # Now uses complete.cases to get required rows:

>
> DF[complete.cases(DF[1:2]),]
  x y  z
1 1 1  1
2 2 2  2
3 3 3 NA

On Wed, Apr 1, 2009 at 11:49 AM, Jose Iparraguirre D'Elia
<Jose at erini.ac.uk> wrote:
> Dear all,
>
> Say I have the following dataset:
>
>> DF
>        x     y     z
> [1]   1     1     1
> [2]   2     2     2
> [3]   3     3    NA
> [4]   4   NA   4
> [5]  NA  5     5
>
> And I want to omit all the rows which have NA, but only in columns X and Y, so that I get:
>
>  x  y  z
> 1  1  1
> 2  2  2
> 3  3  NA
>
> If I use na.omit(DF), I would delete the row for which z=NA, obtaining thus
>
> x y z
> 1 1 1
> 2 2 2
>
> But this is not what I want, of course.
> If I use na.omit(DF[,1:2]), then I obtain
>
> x y
> 1 1
> 2 2
> 3 3
>
> which is OK for x and y columns, but I wouldn't get the corresponding values for z (ie 1 2 NA)
>
> Any suggestions about how to obtain the desired results efficiently (the actual dataset has millions of records and almost 50 columns, and I would apply the procedure on 12 of these columns)?
>
> Sincerely,
>
> Jose Luis
>
> Jose Luis Iparraguirre
> Senior Research Economist
> Economic Research Institute of Northern Ireland
>
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>