[R] Subset according to groups NA proportion within specific variables

Dimitris Rizopoulos d.rizopoulos at erasmusmc.nl
Mon Feb 21 13:23:01 CET 2011


one way is the following:

DF <- data.frame(x = c(rep(1,3),rep(2,4),rep(3,5)),
     y = rnorm(12), z = c(3,4,5,NA,NA,NA,NA,1,2,1,2,1),
     w = c(1,2,3,3,4,3,5,NA,5,NA,7,8)
)

na.ind <- sapply(DF[-1], is.na)
na.ind <- ave(na.ind, rep(DF$x, 3), col(na.ind)) < 0.5
DF[apply(na.ind, 1, all), ]


I hope it helps.

Best,
Dimitris


On 2/21/2011 12:20 PM, D. Alain wrote:
> Dear R-List,
>
> I have a dataframe with one grouping variable (x) and three response variables (y,z,w).
>
> df<-data.frame(x=c(rep(1,3),rep(2,4),rep(3,5)),y=rnorm(12),z=c(3,4,5,NA,NA,NA,NA,1,2,1,2,1),w=c(1,2,3,3,4,3,5,NA,5,NA,7,8))
>
>> df
>       x            y            z     w
>       1      0.29306106  3      1
>       1      0.54797780  4      2
>       1     -1.38365548  5      3
>       2     -0.20407986 NA    3
>       2     -0.87322574 NA    4
>       2     -1.23356250 NA    3
>       2      0.43929374 NA    5
>       3      1.16405483  1    NA
>       3      1.07083464  2     5
>       3     -0.67463191  1    NA
>       3     -0.66410552  2     7
>       3     -0.02543358  1     8
>
> Now I want to make a new dataframe df.sub comprising only cases pertaining to
>   groups, where the overall proportion of NAs in either of the response variables y,z,w does not exceed 50%.
>
> In the above example, e.g., this would be a dataframe with all cases of the groups 1 and 3 (since there are 100% NAs in z for group 2)
>
>> df.sub
>       x            y            z     w
>       1      0.29306106   3      1
>       1      0.54797780   4      2
>       1     -1.38365548   5      3
>        3      1.16405483   1    NA
>       3      1.07083464   2     5
>       3     -0.67463191   1    NA
>       3     -0.66410552   2     7
>       3     -0.02543358   1     8
>
> Please excuse me if the problem has already been treated somewhere, but so far I was not able to find the right threat for my question in RSeek.
>
> Can anyone help?
>
> Thanks in advance!
>
> D. Alain
>
>
>
> 	[[alternative HTML version deleted]]
>
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Web: http://www.erasmusmc.nl/biostatistiek/



More information about the R-help mailing list