[R] how to filter variables which appear in any row but do not include

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Wed Jun 3 19:34:47 CEST 2020


regex's are not needed. Using Rui's example:

> bad <- mapply(function(x) x %in% unwanted,dat)
> dat[!rowSums(bad),]

     V1   V2   V3   V4   V5
2  E117 E113 E119 E100  E10
4  E114  E11 E119 E119 E114
5  E109 E111 E103 E103 E100
7  E108 E113 E119 E117  E11
8  E114 E105  E10 E109 E110
9  E119 E116 E108 E118 E119
10 E100 E110 E104 E111 E101
13 E111 E116 E101 E110 E116
15 E103  E11 E108  E10 E113
16 E111 E117 E103 E115 E119
17 E104 E110 E104 E117 E114
19 E100 E108  E10 E111 E105
20 E109 E115 E117 E108 E106

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, Jun 3, 2020 at 9:57 AM Rui Barradas <ruipbarradas using sapo.pt> wrote:

> Hello,
>
> If you want to filter out rows with any of the values in a 'unwanted'
> vector, try the following.
>
> First, create a test data set.
>
> x <- scan(what = character(), text = '
> "E10"  "E103" "E104" "E109" "E101" "E108" "E105" "E100" "E106" "E102"
> "E107" "E11"  "E119" "E113" "E115" "E111" "E114" "E110" "E118" "E116"
> "E112"
> "E117"
> ')
>
> set.seed(2020)
> dat <- replicate(5, sample(x, 20, TRUE))
> dat <- as.data.frame(dat)
>
>
> Now, remove all rows that have at least one of "E102" or "E112"
>
>
> unwanted <- c("E102", "E112")
> no <- sapply(dat, function(x){
>    grepl(paste(unwanted, collapse = "|"), x)
> })
> no <- apply(no, 1, any)
> dat[!no, ]
>
>
> That's it, if I understood the problem.
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> Às 15:55 de 03/06/20, Ana Marija escreveu:
> > Hello.
> >
> > I am trying to filter only rows that have ANY of these variables:
> > E109, E119, E149
> >
> > so I did:
> > controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149")))
> >
> > than I checked what I got:
> >> s0 <- sapply(controls, function(x) grep('^E10', x, value = TRUE))
> >> d0=unlist(s0)
> >> d10=unique(d0)
> >> d10
> >   [1] "E10"  "E103" "E104" "E109" "E101" "E108" "E105" "E100" "E106"
> "E102"
> > [11] "E107"
> > s1 <- sapply(controls, function(x) grep('^E11', x, value = TRUE))
> > d1=unlist(s1)
> > d11=unique(d1)
> >> d11
> >   [1] "E11"  "E119" "E113" "E115" "E111" "E114" "E110" "E118" "E116"
> "E112"
> > [11] "E117"
> >
> > I need help with changing this command
> > controls=t %>% filter_all(any_vars(. %in% c("E109", "E119","E149")))
> >
> > so that in the output I do not have any rows that include E102 or E112?
> >
> > Thanks
> > Ana
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list