[R] removing NA from a data frame

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Jun 22 12:11:13 CEST 2012


On 22/06/2012 09:41, Stuart Leask wrote:
> Removing rows with NAs, using na.omit(), doesn't seem to be working for me.

It won't if NA is a level of the factor, which is what you seems to have 
here.  For

 > table(as.factor(c(1,2,NA)))

1 2
1 1

omits NAs by default.

> Dataset:
>
>> str ( ex10s )
>
> 'data.frame':   2189576 obs. of  5 variables:
> $ LOPNR  : int  58 58 58 58 64 64 64 64 64 64 ...
> $ DIAGNOS: Factor w/ 173 levels "F20","F200","F2000",..: 128 128 128 128 105 105 105 160 105 105 ...
> $ X_DATE : int  20060821 20061207 20080102 20090904 20010327 20010925 20020307 20021007 20021007 20030320 ...
> $ SOURCE : int  2 2 2 2 2 2 2 2 2 1 ...
> $ dg     : Factor w/ 7 levels "0","1","2","3",..: 6 6 6 6 5 5 5 6 5 5 ...
>
> The only NAs are in the factor dg (put in by 'recode' from the car library; I'm trying to eliminate cases with particular factor levels)
>
>> table ( ex10s$dg )
>
>        0       1       2       3       4       5      NA
>     2851  271501   63112   98425  335593 1257299  160795
>
> So, I remove the rows with NAs, to a new dataframe ex10ss:
>
>> ex10ss<-na.omit(ex10s)
>
> Check all the NAs have been removed:
>
>> table(ex10ss$dg)
>
>        0       1       2       3       4       5      NA
>     2851  271501   63112   98425  335593 1257299  160795
>
>> dim(ex10s)
> [1] 2189576       5
>> dim(ex10ss)
> [1] 2189576       5
>
> Nothing seems to have changed. I want all the rows with NA in removed.
>
> I am clearly doing something wrong.
>
> The only alternative I could find is pretty similar:
> use <- complete.cases ( ex10 )
> ex10ss<-ex10s[use,]
> which leads to the same result.
>
>
> Stuart
>
>
> Dr Stuart John Leask DM FRCPsych MB Mchir
> Clinical Senior Lecturer and Honorary Consultant Pychiatrist
> Institute of Mental Health, Innovation Park
> Triumph Road, Nottingham, Notts. NG7 2TU. UK
> Tel. +44 115 82 30419 stuart.leask at nottingham.ac.uk<mailto:stuart.leask at nottingham.ac.uk>
> Google 'Dr Stuart Leask'
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list