[R] help on deleting NAs

Petr Pikal petr.pikal at precheza.cz
Fri Feb 18 08:41:51 CET 2005



On 18 Feb 2005 at 13:56, Patrick Connolly wrote:

> On Thu, 17-Feb-2005 at 02:54PM -0600, KeLin at mdanderson.org wrote:
> 
> |> Dear R friends
> |> 
> |> My goal is to eliminate this specific group(1) if the # of NAs in
> this |> group greater than |> 50%(specifically say greater than 3).
> Would you please show me how to do |> it. |> I have a sample data as
> following: |> |> Thanks a lot. |> |> Kevin Lin |> |>            y
> group f1 f2 f3 |> 30         NA     1  1  1  1 |> 27         NA     1 
> 1  2  2 |> 48         NA     1  2  1  2 |> 40 -0.6066416     1  2  2 
> 1 |> 24 -0.8323225     1  3  2  2 |> 25  1.3401226     2  1  1  1 |>
> 13  1.2619082     2  1  2  1 |> 14 -0.4323220     2  3  1  1 |> 36 
> 0.8406529     2  3  2  2 |> 21  0.9604758     3  1  2  1 |> 18 
> 0.9562072     3  2  1  1 |> 45  1.1285016     3  2  1  1 |> 50        
> NA     4  1  1  1 |> 11         NA     4  1  1  2 |> 41 -1.1017167    
> 4  2  1  1 |> 37  0.9661283     4  3  1  1 |> 39 -0.2540905     4  3 
> 1  2
> 
> 
> There's probably a lot of niftier ways but this will give an idea: If
> X is your dataframe above,

Hi

I am not sure if it is niftier but

> x <- read.table("clipboard",header=T)
> x[!x$group %in% which(tapply(is.na(x$y), x$group, sum) > 2), ]
            y group f1 f2 f3
25  1.3401226     2  1  1  1
13  1.2619082     2  1  2  1
14 -0.4323220     2  3  1  1
36  0.8406529     2  3  2  2
21  0.9604758     3  1  2  1
18  0.9562072     3  2  1  1
45  1.1285016     3  2  1  1
50         NA     4  1  1  1
11         NA     4  1  1  2
41 -1.1017167     4  2  1  1
37  0.9661283     4  3  1  1
39 -0.2540905     4  3  1  2

or if you want to use this 50% margin

x[!x$group %in% which (tapply(is.na(x$y),x$group,sum)/ 
tapply(is.na(x$y),x$group,length)>.5),]

gives you what you want.

Cheers
Petr






> 
> > aa <- with(X, tapply((y), group, function(x) length(x[is.na(x)])))
> > names(aa[aa>2])
> [1] "1"
> 
> > X[!with(X, group%in%as.numeric(names(aa[aa>2]))),]
>             y group f1 f2 f3
> 6   1.3401226     2  1  1  1
> 7   1.2619082     2  1  2  1
> 8  -0.4323220     2  3  1  1
> 9   0.8406529     2  3  2  2
> 10  0.9604758     3  1  2  1
> 11  0.9562072     3  2  1  1
> 12  1.1285016     3  2  1  1
> 13         NA     4  1  1  1
> 14         NA     4  1  1  2
> 15 -1.1017167     4  2  1  1
> 16  0.9661283     4  3  1  1
> 17 -0.2540905     4  3  1  2
> > 
> 
> The function in the tapply part could be made more general if 3
> doesn't always constitute a majority.
> 
> HTH
> 
> -- 
> Patrick Connolly
> HortResearch
> Mt Albert
> Auckland
> New Zealand 
> Ph: +64-9 815 4200 x 7188
> ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
> ~ I have the world`s largest collection of seashells. I keep it on all
> the beaches of the world ... Perhaps you`ve seen it.  ---Steven Wright
> ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
> ~
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

Petr Pikal
petr.pikal at precheza.cz




More information about the R-help mailing list