[R] how to filter variables which appear in any row but do not include

Rui Barradas ru|pb@rr@d@@ @end|ng |rom @@po@pt
Wed Jun 3 21:25:09 CEST 2020


Hello,

I forgot about %in%. Maybe because in the OP there were regex's.
And rowSums is much faster than apply.

In my tests this is 7 times faster than mine but with

%in% instead of grepl and apply(no, 1, any)

Hope this helps,

Rui Barradas

Às 18:34 de 03/06/20, Bert Gunter escreveu:
> regex's are not needed. Using Rui's example:
> 
>  > bad <- mapply(function(x) x %in% unwanted,dat)
>  > dat[!rowSums(bad),]
> 
>       V1   V2   V3   V4   V5
> 2  E117 E113 E119 E100  E10
> 4  E114  E11 E119 E119 E114
> 5  E109 E111 E103 E103 E100
> 7  E108 E113 E119 E117  E11
> 8  E114 E105  E10 E109 E110
> 9  E119 E116 E108 E118 E119
> 10 E100 E110 E104 E111 E101
> 13 E111 E116 E101 E110 E116
> 15 E103  E11 E108  E10 E113
> 16 E111 E117 E103 E115 E119
> 17 E104 E110 E104 E117 E114
> 19 E100 E108  E10 E111 E105
> 20 E109 E115 E117 E108 E106
> 
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along 
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> 
> On Wed, Jun 3, 2020 at 9:57 AM Rui Barradas <ruipbarradas using sapo.pt 
> <mailto:ruipbarradas using sapo.pt>> wrote:
> 
>     Hello,
> 
>     If you want to filter out rows with any of the values in a 'unwanted'
>     vector, try the following.
> 
>     First, create a test data set.
> 
>     x <- scan(what = character(), text = '
>     "E10"  "E103" "E104" "E109" "E101" "E108" "E105" "E100" "E106" "E102"
>     "E107" "E11"  "E119" "E113" "E115" "E111" "E114" "E110" "E118"
>     "E116" "E112"
>     "E117"
>     ')
> 
>     set.seed(2020)
>     dat <- replicate(5, sample(x, 20, TRUE))
>     dat <- as.data.frame(dat)
> 
> 
>     Now, remove all rows that have at least one of "E102" or "E112"
> 
> 
>     unwanted <- c("E102", "E112")
>     no <- sapply(dat, function(x){
>         grepl(paste(unwanted, collapse = "|"), x)
>     })
>     no <- apply(no, 1, any)
>     dat[!no, ]
> 
> 
>     That's it, if I understood the problem.
> 
> 
>     Hope this helps,
> 
>     Rui Barradas
> 
> 
>     Às 15:55 de 03/06/20, Ana Marija escreveu:
>      > Hello.
>      >
>      > I am trying to filter only rows that have ANY of these variables:
>      > E109, E119, E149
>      >
>      > so I did:
>      > controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149")))
>      >
>      > than I checked what I got:
>      >> s0 <- sapply(controls, function(x) grep('^E10', x, value = TRUE))
>      >> d0=unlist(s0)
>      >> d10=unique(d0)
>      >> d10
>      >   [1] "E10"  "E103" "E104" "E109" "E101" "E108" "E105" "E100"
>     "E106" "E102"
>      > [11] "E107"
>      > s1 <- sapply(controls, function(x) grep('^E11', x, value = TRUE))
>      > d1=unlist(s1)
>      > d11=unique(d1)
>      >> d11
>      >   [1] "E11"  "E119" "E113" "E115" "E111" "E114" "E110" "E118"
>     "E116" "E112"
>      > [11] "E117"
>      >
>      > I need help with changing this command
>      > controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149")))
>      >
>      > so that in the output I do not have any rows that include E102 or
>     E112?
>      >
>      > Thanks
>      > Ana
>      >
>      > ______________________________________________
>      > R-help using r-project.org <mailto:R-help using r-project.org> mailing list
>     -- To UNSUBSCRIBE and more, see
>      > https://stat.ethz.ch/mailman/listinfo/r-help
>      > PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>      > and provide commented, minimal, self-contained, reproducible code.
>      >
> 
>     ______________________________________________
>     R-help using r-project.org <mailto:R-help using r-project.org> mailing list --
>     To UNSUBSCRIBE and more, see
>     https://stat.ethz.ch/mailman/listinfo/r-help
>     PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>     and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list