[R] accessing the indices of outliers in a data frame boxplot

Chuck Cleland ccleland at optonline.net
Fri Jan 25 18:01:14 CET 2008


On 1/25/2008 11:39 AM, Karin Lagesen wrote:
> I have a data frame containing columns which are factors. I use this
> to make boxplots for the data, with one box per factor. I would now
> like to get at the data in the data frame which corresponds to the
> outliers. I have so far found the $out, which gives "the values of any
> data points which lie beyond the extremes of the whiskers", but I
> haven't found anything which will let me get at the indices in the
> original data frame for these outliers. 
> 
> I think there might be a chance that I could simply compare the values
> I am plotting from my data frame with the values for the whiskers and
> use that as a criteria, but I am unsertain of how to do this withhout
> doing it manually. The factor I am plotting against contains 17
> levels, and I'd thus like to see if there is a somewhat more general
> solution available.
> 
> Thanks for your help!
> 
> Karin

   You can use the %in% operator (is.element) to see which data values 
in your data frame match an outlier value.  Then use which() to return 
the TRUE indices.  For example:

set.seed(245)

df <- data.frame(GRP = rep(LETTERS[1:4], each=25), Y = rchisq(100, 2))

mybp <- boxplot(Y ~ GRP, data=df)

which(df$Y %in% mybp$out)
[1]  8 12 47 66 88 93

mybp$out
[1] 5.919915 9.135578 5.723714 8.758584 8.502147 4.920513

df$Y[which(df$Y %in% mybp$out)]
[1] 5.919915 9.135578 5.723714 8.758584 8.502147 4.920513

   See ?is.element and ?which.

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894



More information about the R-help mailing list