[R] Filtering an Entire Dataset based on Several Conditions

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Mon May 9 20:09:39 CEST 2022


This is trivial, so perhaps there is a miscommunication. How do you want to
handle values outside your desired range? I would simply change them to NA
(see below), but perhaps you have something else in mind that you need to
describe more explicitly. Anyway, below is a simple example of what I
*think* you asked for. Apologies if I have misunderstood.

> set.seed(567)
> ## create a data frame with 3 columns and 5 rows from norm(0,3)
> d <- as.data.frame(lapply(rep(5,3), function(x)round(rnorm(x,0,3),2)))
> names(d) <- LETTERS[1:3]
> d
      A     B     C
1  1.97 -1.23 -3.41
2  1.02 -1.12 -2.27
3 -1.92 -6.37 -6.44
4 -4.32  0.18  4.08
5  0.66 -5.82 -0.81
> d[abs(d) > 3] <- NA
> d
      A     B     C
1  1.97 -1.23    NA
2  1.02 -1.12 -2.27
3 -1.92    NA    NA
4    NA  0.18    NA
5  0.66    NA -0.81

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, May 9, 2022 at 9:44 AM Paul Bernal <paulbernal07 using gmail.com> wrote:

> Dear Rui,
>
> I was trying to dput() the datasets I am working on, but since it is a bit
> large (42,000 rows by 60 columns) couldn´t retrieve all the structure of
> the data to include it here, so I am attaching a couple of files. One is
> the raw data (called trainFeatures42k), which is the data I need to
> normalize, and the other is normalized_Data, which is the data normalized
> (or at least I think I got to normalize it).
>
>  Normalized_Data.csv
> <
> https://drive.google.com/file/d/143I1O710gAqWjzx48Gt1bwUbrG0mbpfa/view?usp=drive_web
> >
>  trainFeatures42k.xls
> <
> https://drive.google.com/file/d/1deMzGMkJyeVsnRzTKirmm4VqIBRzbvzV/view?usp=drive_web
> >
>
> I have tried some of the code you and other friends from the community have
> kindly shared, but have not been able to filter values > -3 and < 3.
>
> Thank you all for your valuable help always.
> Best,
> Paul
>
> El lun, 9 may 2022 a las 4:22, Rui Barradas (<ruipbarradas using sapo.pt>)
> escribió:
>
> > Hello,
> >
> > Something like this?
> > First normalize the data.
> > Then a apply loop creates a logical matrix giving which numbers are in
> > the range -3 to 3.
> > If they are all TRUE then their sum by rows is equal to the number of
> > columns. This creates a logical index i.
> > Use that index i to subset the scaled data set.
> >
> > # test data set, remove the Species column (not numeric)
> > df1 <- iris[-5]
> >
> > df1_norm <- scale(df1)
> > i <- rowSums(apply(df1_norm, 2, \(x) x > -3 & x < 3)) == ncol(df1_norm)
> >
> > # returns a matrix
> > df1_norm[i, ]
> >
> > # returns a data.frame
> > as.data.frame(df1_norm[i,])
> >
> >
> > Hope this helps,
> >
> > Rui Barradas
> >
> > Às 09:23 de 09/05/2022, Paul Bernal escreveu:
> > > Dear friends,
> > >
> > > I have a dataframe which every single (i,j) entry (i standing for ith
> > row,
> > > j for jth column) has been normalized (converted to z-scores).
> > >
> > > Now I want to filter or subset the dataframe so that I only end up with
> > a a
> > > dataframe containing only entries greater than -3 or less than 3.
> > >
> > > How could I accomplish this?
> > >
> > > Best,
> > > Paul
> > >
> > >       [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list