[R] Boxplot Help

Gabor Grothendieck ggrothendieck at gmail.com
Sat Aug 19 05:01:41 CEST 2006


In reviewing this I found an error in the case that there is an
outlier in one group with an equal value in another group that
is not an outlier.    Also the iris example does not have duplicate
outliers so its not a very good test.  Here is a much shorter
version that does not have the cited problem.  Also we use
more suitable test data.

For each group, g, we find the indices in x, idx, of the values
corresponding to that group in out$out and then we use text()
to display those indices.  (Note that it will overprint indices
if there are multiple outliers with the same value in a group.
One could try jittering the x or y values in text to address
this.)

x <- c(1:49, 100, 51:100, 101:148, 50, 50)
grp <- gl(3, 50)
out <- boxplot(x ~ grp)
for(g in unique(out$group)) {
   idx <- which(x %in% out$out[out$group == g] & grp == g)
   text(g, x[idx], idx, pos = 4, col = 2, cex = .5)
}


On 8/18/06, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> Try this:
>
> result <- boxplot(Petal.Length ~ Species, iris)
> if (length(result$out))
>  text(result$group, result$out, match(result$out, iris$Petal.Length),
>   pos = 4, col = "red")
>
> If the outliers can be non-unique then match is not enough.
> In that case assume that the nth occurrence of
> any value in result$out is also the nth occurrence in the
> vector boxplotted.  (Sort the data frame by group if that is
> not the case.)   This assumption is sufficient to allow us to write
> posof which gives the index into the data frame of any value in out.
>
> # determine position of x in y
> # assuming that if there are duplicates in x that
> # they occur the same number of times and in
> # the same order so that the 2nd occurrence of 37
> # in x would correspond to the 2nd occurrence of 37 in y
> posof <- function(x, y) {
>   n <- sapply(seq(x), function(m) sum(x[m] == x[1:m]))
>   mapply(function(x, n) which(y == x)[n], x, n)
> }
>
> result <- boxplot(Petal.Length ~ Species, iris)
> if (length(result$out))
>  text(result$group, result$out, posof(result$out, iris$Petal.Length),
>   pos = 4, col = "red")
>
>
>
> On 8/18/06, Ana Patricia Martins <ana.pmartins at ine.pt> wrote:
> > Hello R-users and developers,
> >
> >
> >
> > Once again, I'm asking for your help.
> >
> > I can identify outliers in boxplot with this instruction
> >
> >
> >
> > result <- boxplot( Income ~ Sex,  col = "lightgray", data=dados)
> >
> > if (length(result$out))
> >
> >  text(result$group, result$out, result$out, pos = 4, col = "red")
> >
> >
> >
> > But I can not identify the outlier's id (variable names) in the boxplot.
> >
> > Can anyone help me?
> >
> > Thanks in advance,
> >
> >
> >
> > Atenciosamente,
> >
> > Ana Patricia Martins
> >
> > -------------------------------------------
> >
> > Serviço Métodos Estatísticos
> >
> > Departamento de Metodologia Estatística
> >
> > INE - Portugal
> >
> > Telef:  218 426 100 - Ext: 3210
> >
> > E-mail:  <mailto:ana.pmartins at ine.pt> ana.pmartins at ine.pt
> >
> >
> >
> >
> >        [[alternative HTML version deleted]]
> >
> >
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
>



More information about the R-help mailing list