[R] : unusual combinations of categorical data

Joshua Wiley jwiley.psych at gmail.com
Mon Nov 8 23:39:29 CET 2010


Hi,

On Mon, Nov 8, 2010 at 2:25 PM, Alan Chalk <Alan.Chalk at gcc.rsagroup.com> wrote:
> Regarding unusual combinations of factors in categorical data.

where all variables are categorical?

> Are there any R packages that can be used to identify the outliers i.e.
> unusual combinations in categorical datasets ?

"outlier" or "unusual" tends to be rather variable, that is something
unusual in one data set may not be in another.  If you are dealing
with strictly categorical variables, I am not certain how you would
define an outlier.  The categories only have the meaning attached to
them, so it seems like they would only indicate outliers if you
decided that an entire category was an outlier (e.g., males, females,
half-man-half-ox).  If you have one continuous variable in mind by
different levels of a factor, then you could just use some simple
plots (e.g., ggplot() + geom_point() + facet_grid(factor ~ .) or
something similar).  You could also z-score the values by each factor
level and then extract zscores more extreme than +/- 3 or whatever
value you like.  It might be easier to give you feedback if you have a
more specific example.

Cheers,

Josh

>
> Thanks.
>
>
> ================================================================================
>
> Notice of Confidentiality
>
> This transmission contains information that may be confidential and that may also be privileged. Unless you are the intended recipient of the message (or authorised to receive it for the intended recipient) you may not copy, forward, or otherwise use it, or disclose it or its contents to anyone else. If you have received this transmission in error please notify us immediately and delete it from your system.
>
> RSA Insurance Group plc. Registered in England No. 2339826. The Registered Office is 9th Floor, One Plantation Place, 30 Fenchurch Street, London EC3M 3BD
>
> ================================================================================
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/



More information about the R-help mailing list