[R] Remove missings (quick question)

Marc Schwartz marc_schwartz at me.com
Fri Nov 9 18:05:14 CET 2012


On Nov 9, 2012, at 10:50 AM, Eiko Fried <torvon at gmail.com> wrote:

> A colleague wrote the following syntax for me:
> 
> D = read.csv("x.csv")
> 
> ## Convert -999 to NA
> for (k in 1:dim(D)[2]) {
>    I = which(D[,k]==-999)
>    if (length(I) > 0) {
>        D[I,k] = NA
>    }
> }
> 
> The dataset has many missing values. I am running several regressions on
> this dataset, and want to ensure every regression has the same subjects.
> 
> Thus I want to drop subjects listwise for dependent variables y1-y9 and
> covariates x1-x5 (if data is missing on ANY of these variables, drop
> subject).
> 
> How would I do this after running the syntax above?
> 
> Thank you


Modify the initial read.csv() call to:

  D <- read.csv("x.csv", na.strings = "-999")

That will convert all -999 values to NA's upon import so that you don't have to post-process it.

See ?read.csv for more info.

Once that is done, R's default behavior is to remove observations with any missing data (eg. NA values) when using modeling functions. Or you can pre-process using:

  D.New <- na.omit(D)

and then use D.New for all of your subsequent analyses. See ?na.omit.

Regards,

Marc Schwartz




More information about the R-help mailing list