[R] R how to find outliers and zero mean columns?
normanmath1 at gmail.com
Thu Mar 31 04:30:12 CEST 2016
Thanks for your reply. I know these basic stuffs in R.
But I want to know let say you have a data frame X with 300 features.
>From that 300 features I need to pullout the names of each feature
that has zero values for all the observations in that sample.
Here I am looking for a package or a function to do that.
And how do I know whether there are abnormal values for each feature. Let
I have 300 features and 100000 observations. It is hard to look everything
in the excel file. Instead of that I am looking for a package that does the
I hope you understood.
Thanks a lot
On Thu, Mar 31, 2016 at 1:13 PM, Jim Lemon <drjimlemon at gmail.com> wrote:
> Hi Norman,
> To check whether all values of an object (say "x") fulfill a certain
> condition (==0):
> If your object (X) is indeed a data frame, you can only do this by
> column, so if you want to get the results:
> all_zeros<-function(x) return(all(x==0))
> If your data frame (or a subset) contains all numeric values, you can
> finesse the problem like this:
> What you get is a list of logical (TRUE/FALSE) values from lapply, so
> it has to be unlisted to get a vector of logical values like you get
> with "apply".
> You can then use that vector to index (subset) the original data frame
> by logically inverting it with ! (NOT):
> Your "outliers" look suspiciously like missing values from certain
> statistical packages. If you know the values you are looking for, you
> can do something like:
> and then "remove" them by replacing those values with NA:
> Be aware that all these hackles (diminutive of hacks) are pretty
> specific to this example. Also remember that if this is homework, your
> karma has just gone down the cosmic sinkhole.
> On Thu, Mar 31, 2016 at 9:56 AM, Norman Pat <normanmath1 at gmail.com> wrote:
> > Hi team
> > I am new to R so please help me to do this task.
> > Please find the attached data sample. But in the original data frame I
> > have 350 features and 400000 observations.
> > I need to carryout these tasks.
> > 1. How to Identify features (names) that have all zeros?
> > 2. How to remove features that have all zeros from the dataset?
> > 3. How to identify features (names) that have outliers such as 99999,-1
> > the data frame.
> > 4. How to remove outliers?
> > Many thanks
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
More information about the R-help