[R] ggplot2: multiple box plots, different tibbles/dataframes

Avi Gross @v|gro@@ @end|ng |rom ver|zon@net
Thu Nov 11 20:03:15 CET 2021


Rich,

This is not a place designed for using packages but since this discussion
persists, I will supply you with SAMPLE code thrown together in just a few
minutes to illustrate the IDEAS, but your would obviously be tweaked to your
needs. I made a very small amount of data to illustrate several approaches
and neglected worrying about the X dimension. And, you may well want to use
other variants such as facet_grid() instead if it does more like what you
want.

I then threw in cowplot() as an example of doing it another way. This is
more useful for combining heterogeneous graphs. Many of the thinks I show
just as a silly example have alternates and there are multiple packages that
do similar (and also different) things than cowplot does if you want to tue
the output with other niceties.

 If you copy the code below (installing needed packages first if needed) it
should run on your machine if you do it in chunks so you can see the graphs
one at a time.

#START of code

# Load libraries needed, using install.packages() first if needed.
library(tidyverse)

# Make sample data AS IF you have already read in from file and converted.
df1 <- data.frame(site_nbr=1, DATE=1:5,
cfs=c(11900,11800,11900,11700,11800))
df2 <- data.frame(site_nbr=2, DATE=3:7,
cfs=c(12900,12600,12900,12700,12300))

# Combine al your data into one df.
df <- rbind(df1, df2)
# rm (df1, df2)

# Make a factor in the order you want.
df$site_nbr <- factor(x=df$site_nbr, levels=c(2,1))

# ready for a ggplot segmented by site_nbr.
ggplot(data=df, aes(x=NULL, y=cfs)) +
  geom_boxplot(aes(group=site_nbr))

# Or use color instead for a more specific grouping.
ggplot(data=df, aes(x=NULL, y=cfs)) +
  geom_boxplot(aes(color=site_nbr))

# Or make multiple lattice-like plots, default may be horizontal.
ggplot(data=df, aes(x=NULL, y=cfs)) +
  geom_boxplot() +
  facet_wrap(~site_nbr)

# Or make multiple lattice-like plots, specifying you want vertical.
ggplot(data=df, aes(x=NULL, y=cfs)) +
  geom_boxplot() +
  facet_wrap(~site_nbr, nrow=2)

# The above has the same scale used, so if you want, change them.
ggplot(data=df, aes(x=NULL, y=cfs)) +
  geom_boxplot() +
  facet_wrap(~site_nbr, nrow=2, ncol=1, scales="free")

# ALTERNATE METHOD of making multiple plots and combining them later.
require("cowplot")

# read in data to simulate what is shown below:
df1 <- data.frame(site_nbr=1, DATE=1:5,
cfs=c(11900,11800,11900,11700,11800))
df2 <- data.frame(site_nbr=2, DATE=3:7,
cfs=c(12900,12600,12900,12700,12300))

# Create and save two ggplots, or more in your case:

p1 <- ggplot(data=df1, aes(x=NULL, y=cfs)) +
  geom_boxplot(color="red", fill="yellow")


p2 <- ggplot(data=df2, aes(x=NULL, y=cfs)) +
  geom_boxplot(color="green", fill="pink")

# combine the two or more verically using the plot_grid() from cowplot
plot_grid(p1, p2, ncol=1)

#END OF CODE

-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Rich Shepard
Sent: Thursday, November 11, 2021 1:25 PM
To: r-help using r-project.org
Subject: Re: [R] ggplot2: multiple box plots, different tibbles/dataframes

On Thu, 11 Nov 2021, Avi Gross via R-help wrote:

> Say I have a data.frame with columns called PLACE and MEASURE others. 
> The one I call PLACE would be a factor containing the locations you 
> are measuring at. I mean it would be character strings of your N 
> places but the factors would be made in the order you want the results 
> in. The MEASURE variable in each row would contain one of the many 
> measures at that location. You probably would have other columns like
DATE.

Avi/Jeff/Burt,

Here are the head and tail of one data file:
site_nbr,year,mon,day,hr,min,tz,cfs
14174000,1986,10,01,00,30,PDT,11900
14174000,1986,10,01,01,00,PDT,11900
14174000,1986,10,01,01,30,PDT,11900
14174000,1986,10,01,02,00,PDT,11800
14174000,1986,10,01,02,30,PDT,11800
14174000,1986,10,01,03,00,PDT,11800
14174000,1986,10,01,03,30,PDT,11800
14174000,1986,10,01,04,00,PDT,11800
14174000,1986,10,01,04,30,PDT,11800
...
14174000,2021,09,30,23,12,PDT,5070
14174000,2021,09,30,23,17,PDT,5070
14174000,2021,09,30,23,22,PDT,5050
14174000,2021,09,30,23,27,PDT,5050
14174000,2021,09,30,23,32,PDT,5050
14174000,2021,09,30,23,37,PDT,5050
14174000,2021,09,30,23,42,PDT,5050
14174000,2021,09,30,23,47,PDT,5050
14174000,2021,09,30,23,52,PDT,5050
14174000,2021,09,30,23,57,PDT,5050

(Water years begin October 1st and end September 30th.)

The other three locations have the same format.

The boxplots for each PLACE (site_nbr) should summarize all MEASURE (cfs)
values for all recorded data (DATE).

The R tibbles have a datetime column which could be the DATE.

If I assemble all 4 sites into a single tupple I suppose it would have three
columns PLACE (the grouping factor), DATE (on the x axis), and MEASURE (cfs
on the y axis) and each boxplot would be grouped so the command would be:

disc_plot <- ggplot(df, aes(x = group, y = cfs)) +
 	geom_boxplot()

Is this close?

Thanks,

Rich

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list