[R] ggplot2: further query about back to back bar plots

Brian Diggs diggsb at ohsu.edu
Thu Jul 25 21:24:41 CEST 2013

On 7/25/2013 11:34 AM, Rui Barradas wrote:
> Hello,
> I'm not an expert in ggplot2 graphics but I can (partly) answer to your
> first question. Inline.
> Em 25-07-2013 18:30, Gavin Rudge escreveu:
>> Further to my recent post on this topic and thanks to help received
>> already (thanks BTW), I've got back-to-back plots working nicely to
>> give me population pyramids, with some overlaid point data from a
>> different time period, using the code below.
>> #packages
>> library(ggplot2)
>> library(reshape2)
>> library(plyr)
>> #sample data
>> set.seed(33)
>> df<-data.frame(ag=c(1:18),males_year1=sample(100:200,18),females_year1=sample(100:200,18),males_year2=sample(100:200,18),females_year2=sample(100:200,18))
>> #melt the data set
>> df<-data.frame(melt(df,id="ag"))
>> df
>> #here is the plot
>> p<-ggplot(df)+
>> geom_bar(subset=.(df$variable=="males_year1"),stat="identity",aes(x=ag,y=value),fill="#6666FF")+
>> geom_bar(subset=.(df$variable=="females_year1"),stat="identity",aes(x=ag,y=-value),fill="#FF9333")+
>> geom_point(subset=.(df$variable=="males_year2"),stat="identity",aes(x=ag,y=value),size=3,colour="#330099")+
>> geom_point(subset=.(df$variable=="females_year2"),stat="identity",aes(x=ag,y=-value),size=3,colour="#CC3300")+
>>    coord_flip()+
>>    theme_bw()+
>> scale_y_continuous(limits=c(-200,200),breaks=seq(-200,200,50),labels=abs(seq(-200,200,50)))+
>> scale_x_continuous(limits=c(0,19),breaks=seq(1,18,1),labels=abs(seq(1,18,1)))+
>>    xlab("age group")+ylab("population")+
>>    theme_bw()+
>>    xlab("age group")+
>>    ylab("population")+
>>    geom_text(y=-100,x=19.2,label="Females")+
>>    geom_text(y=100,x=19.2,label="Males")
>> p
>> Two questions remaining.  Firstly have I used a large amount of code
>> to acheive this or is this about right for the effect that I'm after?
> You have repeated some code, the following lines show up twice.
>    theme_bw()+
>    xlab("age group")+
>    ylab("population")+

In addition to the repetition Rui notes, here is how I'd shorten it, 
albeit not by much (and with wrapping, actually more lines):

p <- ggplot(df)+
   geom_bar(subset=.(variable=="males_year1"), stat="identity",
            position="identity", aes(x=ag,y=value), fill="#6666FF")+
   geom_bar(subset=.(variable=="females_year1"), stat="identity",
            position="identity", aes(x=ag,y=-value), fill="#FF9333")+
              aes(x=ag,y=value), size=3, colour="#330099")+
              aes(x=ag,y=-value), size=3, colour="#CC3300")+
   scale_y_continuous("population", limits=c(-200,200),
                      breaks=seq(-200,200,50), labels=abs)+
   scale_x_continuous("age group", limits=c(0,19.2), breaks=seq(1,18,1))+
   annotate(geom="text", y=-100, x=19.2, label="Females")+
   annotate(geom="text", y= 100, x=19.2, label="Males")

* adding position="identity" to the two geom_bar calls to suppress the
Warning message:
Stacking not well defined when ymin != 0
* dropping stat="identity" in geom_point since that is the default
* pulling the xlab and ylab into the scale_x_continuous and 
scale_y_continuous since you already have those calls
* simplify the labels for the scales. For x, don't need to do anything 
to specify the labels; the work as expected based on the breaks. For y, 
rather than give a vector of labels, give a function which transforms 
the breaks into the labels you want (abs).
* convert last two geom_text calls to annotations.
* increased the limits in scale_x_continuous so that the annotations 
were not lost
* the subset should not refer to df directly

> Hope this helps,
> Rui Barradas
>> Secondly I'm quite confused about how to put a legend onto a plot like
>> this. I'm getting slowly into the ggplot way of doing things, but I'm
>> totally baffled by legends; say I wanted a legend with an appropriate
>> label for both genders and both time periods showing the colours of
>> the bars and dots I've used here as examples, how do I do this?  I've
>> tried scale_fill with a bunch of arguments to no avial.  I'm confused
>> about where in the hierarchy of ggplot commands you actually build the
>> legend and how you map it to your data.  The usual trawl of the
>> package pdf / cook book for R etc hasn't really helped. Can someone
>> show me how to do this please?

Your confusion with legends likely comes from how ggplot approaches 
legends. A legend, for ggplot, shows the mapping between an aesthetic 
(color, shape, etc.) and the data values that it represents. Therefore, 
it is only necessary when there is a mapping between data and 
aesthetics. In your example, you manually set your colour/fill 
aesthetics, so there is no mapping, so there is no legend.

The variables that are implied by your data are Sex and Year, so make 
those explicit:

df2 <- cbind(df, colsplit(df$variable, "_", c("Sex", "Year")))

Now we can simplify the two geom_bar and two geom_point calls into one 
each, setting the sign of value based on the Sex column. I set the 
colour and fill to the interaction of Sex and Year (could have just used 
variable, I suppose, but this is how I did it). Now colour and fill are 
really two separate aesthetics, but I need to treat them the same and 
include both in each geom so that it comes out right in the end. The 
colors that you specify manually in the original version become 
specified in the scale_colour_manual and scale_fill_manual calls:

p <- ggplot(df2) +
   geom_bar(subset=.(Year=="year1"), stat="identity",
            aes(x=ag, y=ifelse(Sex=="females",-1,1)*value,
                fill=interaction(Sex,Year))) +
              aes(x=ag, y=ifelse(Sex=="females",-1,1)*value,
              size=3) +
   annotate(geom="text", y=-100, x=19.2, label="Females") +
   annotate(geom="text", y= 100, x=19.2, label="Males") +
   scale_x_continuous("age group", limits=c(0,19.2), breaks=seq(1,18,1))+
   scale_y_continuous("population", limits=c(-200,200),
                      breaks=seq(-200,200,50), labels=abs) +
   scale_colour_manual("Sex and Year",
                       breaks = c("females.year1", "females.year2",
                                  "males.year1", "males.year2"),
                       values = c("#FF9333", "#CC3300",
                                  "#6666FF", "#330099")) +
   scale_fill_manual("Sex and Year",
                     breaks = c("females.year1", "females.year2",
                                "males.year1", "males.year2"),
                     values = c("#FF9333", "#CC3300",
                                "#6666FF", "#330099")) +
   coord_flip() +

Conceptually, you are mapping Sex to a color and Year to a shade of that 
color, but ggplot does not have a way of splitting up aspects of colors 
to different data variables (hue to one variable, saturation to another, 
say), so this hack of mapping the combination of them is used.

You can get the legend to be a bit more like a grid by adding

guide = guide_legend(nrow=2)

to both scale_colour_manual and scale_fill_manual. The labels in those 
can also be changed to something better (be sure to do it in both).

>> Many thanks.
>> Gavin.
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University

More information about the R-help mailing list