[R] ggplot2: further query about back to back bar plots
Brian Diggs
diggsb at ohsu.edu
Thu Jul 25 21:24:41 CEST 2013
On 7/25/2013 11:34 AM, Rui Barradas wrote:
> Hello,
>
> I'm not an expert in ggplot2 graphics but I can (partly) answer to your
> first question. Inline.
>
> Em 25-07-2013 18:30, Gavin Rudge escreveu:
>> Further to my recent post on this topic and thanks to help received
>> already (thanks BTW), I've got back-to-back plots working nicely to
>> give me population pyramids, with some overlaid point data from a
>> different time period, using the code below.
>>
>> #packages
>> library(ggplot2)
>> library(reshape2)
>> library(plyr)
>> #sample data
>> set.seed(33)
>> df<-data.frame(ag=c(1:18),males_year1=sample(100:200,18),females_year1=sample(100:200,18),males_year2=sample(100:200,18),females_year2=sample(100:200,18))
>>
>> #melt the data set
>> df<-data.frame(melt(df,id="ag"))
>> df
>> #here is the plot
>> p<-ggplot(df)+
>>
>> geom_bar(subset=.(df$variable=="males_year1"),stat="identity",aes(x=ag,y=value),fill="#6666FF")+
>>
>>
>> geom_bar(subset=.(df$variable=="females_year1"),stat="identity",aes(x=ag,y=-value),fill="#FF9333")+
>>
>>
>> geom_point(subset=.(df$variable=="males_year2"),stat="identity",aes(x=ag,y=value),size=3,colour="#330099")+
>>
>>
>> geom_point(subset=.(df$variable=="females_year2"),stat="identity",aes(x=ag,y=-value),size=3,colour="#CC3300")+
>>
>> coord_flip()+
>> theme_bw()+
>>
>> scale_y_continuous(limits=c(-200,200),breaks=seq(-200,200,50),labels=abs(seq(-200,200,50)))+
>>
>>
>> scale_x_continuous(limits=c(0,19),breaks=seq(1,18,1),labels=abs(seq(1,18,1)))+
>>
>> xlab("age group")+ylab("population")+
>> theme_bw()+
>> xlab("age group")+
>> ylab("population")+
>> geom_text(y=-100,x=19.2,label="Females")+
>> geom_text(y=100,x=19.2,label="Males")
>>
>> p
>>
>> Two questions remaining. Firstly have I used a large amount of code
>> to acheive this or is this about right for the effect that I'm after?
>
> You have repeated some code, the following lines show up twice.
>
> theme_bw()+
> xlab("age group")+
> ylab("population")+
In addition to the repetition Rui notes, here is how I'd shorten it,
albeit not by much (and with wrapping, actually more lines):
p <- ggplot(df)+
geom_bar(subset=.(variable=="males_year1"), stat="identity",
position="identity", aes(x=ag,y=value), fill="#6666FF")+
geom_bar(subset=.(variable=="females_year1"), stat="identity",
position="identity", aes(x=ag,y=-value), fill="#FF9333")+
geom_point(subset=.(variable=="males_year2"),
aes(x=ag,y=value), size=3, colour="#330099")+
geom_point(subset=.(variable=="females_year2"),
aes(x=ag,y=-value), size=3, colour="#CC3300")+
coord_flip()+
theme_bw()+
scale_y_continuous("population", limits=c(-200,200),
breaks=seq(-200,200,50), labels=abs)+
scale_x_continuous("age group", limits=c(0,19.2), breaks=seq(1,18,1))+
annotate(geom="text", y=-100, x=19.2, label="Females")+
annotate(geom="text", y= 100, x=19.2, label="Males")
Changes:
* adding position="identity" to the two geom_bar calls to suppress the
Warning message:
Stacking not well defined when ymin != 0
* dropping stat="identity" in geom_point since that is the default
* pulling the xlab and ylab into the scale_x_continuous and
scale_y_continuous since you already have those calls
* simplify the labels for the scales. For x, don't need to do anything
to specify the labels; the work as expected based on the breaks. For y,
rather than give a vector of labels, give a function which transforms
the breaks into the labels you want (abs).
* convert last two geom_text calls to annotations.
* increased the limits in scale_x_continuous so that the annotations
were not lost
* the subset should not refer to df directly
>
> Hope this helps,
>
> Rui Barradas
>>
>> Secondly I'm quite confused about how to put a legend onto a plot like
>> this. I'm getting slowly into the ggplot way of doing things, but I'm
>> totally baffled by legends; say I wanted a legend with an appropriate
>> label for both genders and both time periods showing the colours of
>> the bars and dots I've used here as examples, how do I do this? I've
>> tried scale_fill with a bunch of arguments to no avial. I'm confused
>> about where in the hierarchy of ggplot commands you actually build the
>> legend and how you map it to your data. The usual trawl of the
>> package pdf / cook book for R etc hasn't really helped. Can someone
>> show me how to do this please?
Your confusion with legends likely comes from how ggplot approaches
legends. A legend, for ggplot, shows the mapping between an aesthetic
(color, shape, etc.) and the data values that it represents. Therefore,
it is only necessary when there is a mapping between data and
aesthetics. In your example, you manually set your colour/fill
aesthetics, so there is no mapping, so there is no legend.
The variables that are implied by your data are Sex and Year, so make
those explicit:
df2 <- cbind(df, colsplit(df$variable, "_", c("Sex", "Year")))
Now we can simplify the two geom_bar and two geom_point calls into one
each, setting the sign of value based on the Sex column. I set the
colour and fill to the interaction of Sex and Year (could have just used
variable, I suppose, but this is how I did it). Now colour and fill are
really two separate aesthetics, but I need to treat them the same and
include both in each geom so that it comes out right in the end. The
colors that you specify manually in the original version become
specified in the scale_colour_manual and scale_fill_manual calls:
p <- ggplot(df2) +
geom_bar(subset=.(Year=="year1"), stat="identity",
position="identity",
aes(x=ag, y=ifelse(Sex=="females",-1,1)*value,
colour=interaction(Sex,Year),
fill=interaction(Sex,Year))) +
geom_point(subset=.(Year=="year2"),
aes(x=ag, y=ifelse(Sex=="females",-1,1)*value,
colour=interaction(Sex,Year),
fill=interaction(Sex,Year)),
size=3) +
annotate(geom="text", y=-100, x=19.2, label="Females") +
annotate(geom="text", y= 100, x=19.2, label="Males") +
scale_x_continuous("age group", limits=c(0,19.2), breaks=seq(1,18,1))+
scale_y_continuous("population", limits=c(-200,200),
breaks=seq(-200,200,50), labels=abs) +
scale_colour_manual("Sex and Year",
breaks = c("females.year1", "females.year2",
"males.year1", "males.year2"),
values = c("#FF9333", "#CC3300",
"#6666FF", "#330099")) +
scale_fill_manual("Sex and Year",
breaks = c("females.year1", "females.year2",
"males.year1", "males.year2"),
values = c("#FF9333", "#CC3300",
"#6666FF", "#330099")) +
coord_flip() +
theme_bw()
Conceptually, you are mapping Sex to a color and Year to a shade of that
color, but ggplot does not have a way of splitting up aspects of colors
to different data variables (hue to one variable, saturation to another,
say), so this hack of mapping the combination of them is used.
You can get the legend to be a bit more like a grid by adding
guide = guide_legend(nrow=2)
to both scale_colour_manual and scale_fill_manual. The labels in those
can also be changed to something better (be sure to do it in both).
>> Many thanks.
>>
>> Gavin.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University
More information about the R-help
mailing list