[R] detect and replace outliers by the average

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Sat Apr 22 03:53:03 CEST 2023


I think this discussion has gone off the rails to matters lying out of
the purview of this list.

Bert

On Fri, Apr 21, 2023 at 6:16 PM Ebert,Timothy Aaron <tebert using ufl.edu> wrote:
>
> Sometimes outliers happen. No matter the sample size there is always the possibility that one or more values are correct though highly improbable.
>
> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of Richard O'Keefe
> Sent: Friday, April 21, 2023 7:31 PM
> To: AbouEl-Makarim Aboueissa <abouelmakarim1962 using gmail.com>
> Cc: R mailing list <r-help using r-project.org>
> Subject: Re: [R] detect and replace outliers by the average
>
> [External Email]
>
> This can be seen as three steps:
> (1) identify outliers
> (2) replace them with NA (trivial)
> (3) impute missing values.
> There are packages for imputing missing data.
> See
> https://www.analyticsvidhya.com/blog/2016/03/tutorial-powerful-packages-imputing-missing-values/
>
> Here I just want to address the first step.
> An observation is only an outlier relative to some model.
> Outliers can indicate
> - data that are just wrong (data entry error, failing battery in measurement
>   device, all sorts of stuff).  In this case, deletion + imputation makes
>   sense.
> - data that are generated by a mixture of two or more processes,
>   not the single process you thought was there.  In this case,
>   deletion + imputation is dangerous.  The world is trying to tell
>   you something and you are ignoring it.
> - the model is wrong.  Here again, deletion + imputation is
>   dangerous.  You need a better model.
>
> "Detecting outliers in R" as a web query turned up
> https://statsandr.com/blog/outliers-detection-in-r/
> on the first page of results.  There's plenty of material about finding outliers.
>
> But please give very VERY serious consideration to the possibility that some or even all of your outliers are actually GOOD data telling you something you need to know.
>
>
> On Fri, 21 Apr 2023 at 06:38, AbouEl-Makarim Aboueissa < abouelmakarim1962 using gmail.com> wrote:
>
> > Dear All:
> >
> >
> >
> > *Re:* detect and replace outliers by the average
> >
> >
> >
> > The dataset, please see attached, contains a group factoring column "
> > *factor*" and two columns of data "x1" and "x2" with some NA values. I
> > need some help to detect the outliers and replace it and the NAs with
> > the average within each level (0,1,2) for each variable "x1" and "x2".
> >
> >
> >
> > I tried the below code, but it did not accomplish what I want to do.
> >
> >
> >
> >
> >
> > data<-read.csv("G:/20-Spring_2023/Outliers/data.csv", header=TRUE)
> >
> > data
> >
> > replace_outlier_with_mean <- function(x) {
> >
> >   replace(x, x %in% boxplot.stats(x)$out, mean(x, na.rm=TRUE))  #### ,
> > na.rm=TRUE NOT working
> >
> > }
> >
> > data[] <- lapply(data, replace_outlier_with_mean)
> >
> >
> >
> >
> >
> > Thank you all very much for your help in advance.
> >
> >
> >
> >
> >
> > with many thanks
> >
> > abou
> >
> >
> > ______________________
> >
> >
> > *AbouEl-Makarim Aboueissa, PhD*
> >
> > *Professor, Mathematics and Statistics* *Graduate Coordinator*
> >
> > *Department of Mathematics and Statistics* *University of Southern
> > Maine* ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat/
> > .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl.edu
> > %7C1b625ca69ad442654a3e08db42c07f15%7C0d4da0f84a314d76ace60a62331e1b84
> > %7C0%7C0%7C638177166777282433%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
> > MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sda
> > ta=TkZ0pb02TnNHZz94QtR5j%2BcYHwVJLLZRVqnMhmdxpz8%3D&reserved=0
> > PLEASE do read the posting guide
> > http://www.r/
> > -project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%7C1b
> > 625ca69ad442654a3e08db42c07f15%7C0d4da0f84a314d76ace60a62331e1b84%7C0%
> > 7C0%7C638177166777282433%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL
> > CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Rw
> > %2F3iEOV%2Fu2bF16LPt8y8xt8aA9a0P8DsaeXYpo%2F97k%3D&reserved=0
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.r-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list