[R] ggplot2: mapping categorical variable to color aesthetic with faceting

Bryan Hanson hanson at depauw.edu
Tue Oct 6 21:35:12 CEST 2009


A few days ago on the list I had wrestled with the aes() vs aes_string()
issue, along with the same issue with facetting.

The way I ended up handling the point you bring up, Baptiste, is perhaps
rather inefficient but my data sets are not large.  I allow the user to pass
variables, then I use that info to construct extra data frame entries, which
then are suitable for use by ggplot 2 since they are "known" in the data
frame.  Here's what the first part of the actual function looks like, you
can see how I avoided aes_string and related problems with facet:

compareCats <- function(data = NULL, res = NULL, fac1 = NULL, fac2 = NULL,
    fac1order = NULL, fac2order = NULL, fac1cols = NULL,
    method = c("sem", "iqr", "mad", "box", "points"),
    title = "Comparison of Categories", y.lab = "your text here",
    subtitle = "optional explanatory caption") {

    require(ggplot2)
    
    # restructure data so names will match, re-ordering too
    
    data$res <- data[, res]
    a <- match(fac1, names(data))
    b <- match(fac2, names(data))
    data$fac1 <- factor(data[[a]], levels = fac1order)
    data$fac2 <- factor(data[[b]], levels = fac2order)

    # now the plot
        
    p <- ggplot(data, aes(fac1, res, color = fac1)) + facet_grid(. ~ fac2) +
        xlab(NULL) + opts(title = title,
        axis.text.x = theme_text(colour = "black"), axis.ticks =
theme_blank())

And then depending up on the method specified by the user, additional geoms
are added and the plot created.

This gets the job done, but if there are further suggestions, I'd love to
learn other solutions.

Bryan

On 10/6/09 1:08 PM, "baptiste auguie" <baptiste.auguie at googlemail.com>
wrote:

> Further to my previous reply, it occurred to me that ggplot2 would
> only ever use data and colors in your calls to compareCats(): res =
> res, fac1 = fac1, fac2 = fac2 have no effect whatsoever.
> 
> If you want the user to be able to specify the variables used in the
> ggplot2 call, you probably want to look at ?aes_string, as shown
> below,
> 
> compareCats <- function(data, fac1="fac1", fac2="fac2", res="res",
> colors=c("red", "blue")) {
> 
>   require(ggplot2)
>   p <- ggplot(data, aes_string(x=fac1, y=res, color=fac1)) +
> facet_grid(paste(". ~ ", fac2))
>   jit <- position_jitter(width = 0.1)
>   p <- p + layer(geom = "jitter", position = jit) +
>     scale_colour_manual(values=colors)
>   print(p)
>   }
> 
> test <- data.frame(res = rnorm(100), fac1 = as.factor(rep(c("A", "B"), 50)),
>   fac2 = as.factor(rep(c("lrg", "lrg", "sm", "sm"), 25)))
> 
> compareCats(data = test)
> 
> rem <- sample(10, 1:ncol(test)) # randomly remove a few points here and there
> last_plot() %+% test[-rem, ] # replot with new dataset
> 
> HTH,
> 
> baptiste
> 
> 
> 
> 
> 2009/10/6 baptiste auguie <baptiste.auguie at googlemail.com>:
>> Hi,
>> 
>> I may be missing an important design decision, but could you not have
>> only a single data.frame as an argument of your function? From your
>> example, it seems that the colour can be mapped to the fac1 variable
>> of "data",
>> 
>> compareCats <- function(data) {
>> 
>>   require(ggplot2)
>>   p <- ggplot(data, aes(fac1, res, color=fac1)) + facet_grid(. ~ fac2)
>>   jit <- position_jitter(width = 0.1)
>>   p <- p + layer(geom = "jitter", position = jit) +
>>     scale_colour_manual(values=c("red", "blue"))
>>   print(p)
>>   }
>> 
>> 
>> test <- data.frame(res = rnorm(100), fac1 = as.factor(rep(c("A", "B"), 50)),
>>   fac2 = as.factor(rep(c("lrg", "lrg", "sm", "sm"), 25)))
>> 
>> compareCats(data = test)
>> 
>> rem <- runif(5, 1, 100) # randomly remove a few points here and there
>> last_plot() %+% test[-rem,] # replot with new dataset
>> 
>> 
>> HTH,
>> 
>> baptiste
>> 
>> 
>> 
>> 2009/10/6 Bryan Hanson <hanson at depauw.edu>:
>>> Hello Again...  I¹m making a faceted plot of a response on two categorical
>>> variables using ggplot2 and having troubles with the coloring. Here is a
>>> sample that produces the desired plot:
>>> 
>>> compareCats <- function(data, res, fac1, fac2, colors) {
>>> 
>>>    require(ggplot2)
>>>    p <- ggplot(data, aes(fac1, res)) + facet_grid(. ~ fac2)
>>>    jit <- position_jitter(width = 0.1)
>>>    p <- p + layer(geom = "jitter", position = jit, color = colors)
>>>    print(p)
>>>    }
>>> 
>>> test <- data.frame(res = rnorm(100), fac1 = as.factor(rep(c("A", "B"), 50)),
>>>    fac2 = as.factor(rep(c("lrg", "lrg", "sm", "sm"), 25)))
>>> 
>>> compareCats(data = test, res = res, fac1 = fac1, fac2 = fac2, colors =
>>> c("red", "blue"))
>>> 
>>> Now, if I get away from idealized data where there are the same number of
>>> data points per group (25 in this case), I run into problems.  So, if you
>>> do:
>>> 
>>> rem <- runif(5, 1, 100) # randomly remove a few points here and there
>>> test <- test[-rem,]
>>> compareCats(data = test, res = res, fac1 = fac1, fac2 = fac2, colors =
>>> c("red", "blue"))
>>> 
>>> R throws an error due to mismatch between the recycling of colors and the
>>> actual number of data points:
>>> 
>>> Error in `[<-.data.frame`(`*tmp*`, gp, value = list(colour = c("red",  :
>>>  replacement element 1 has 2 rows, need 47
>>> 
>>> I'm new to ggplot2, but have been through the book and the web site enough
>>> to know that my problem is "mapping the varible to the aesthetic"; I also
>>> know I can either "map" or "set" the colors.
>>> 
>>> The question, finally:  is there an simple/elegant way to map a list of two
>>> colors corresponding to A and B onto any random sample size of A and B with
>>> faceting?  If not, and I must "set" the colors:  Do I compute the length of
>>> all possible combos of A, B with lrg, sm, and then create one long vector of
>>> colors for the entire plot?  I tried something like this, and was not
>>> successful, but perhaps could be with more work.
>>> 
>>> All advice appreciated, Bryan (session info below)
>>> 
>>> *************
>>> Bryan Hanson
>>> Professor of Chemistry & Biochemistry
>>> DePauw University, Greencastle IN USA
>>> 
>>>> sessionInfo()
>>> R version 2.9.2 (2009-08-24)
>>> i386-apple-darwin8.11.1
>>> 
>>> locale:
>>> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>> 
>>> attached base packages:
>>> [1] grid      datasets  tools     utils     stats     graphics  grDevices
>>> methods
>>> [9] base
>>> 
>>> other attached packages:
>>>  [1] ggplot2_0.8.3      reshape_0.8.3      proto_0.3-8        mvbutils_2.2.0
>>>  [5] ChemoSpec_1.1      lattice_0.17-25    mvoutlier_1.4      plyr_0.1.8
>>>  [9] RColorBrewer_1.0-2 chemometrics_0.4   som_0.3-4
>>> robustbase_0.4-5
>>> [13] rpart_3.1-45       pls_2.1-0          pcaPP_1.7          mvtnorm_09-7
>>> [17] nnet_7.2-48        mclust_3.2         MASS_7.2-48        lars_0.9-7
>>> [21] e1071_1.5-19       class_7.2-48
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guidehtml
>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>> 




More information about the R-help mailing list