[R] ggplot2: multiple box plots, different tibbles/dataframes

Avi Gross @v|gro@@ @end|ng |rom ver|zon@net
Thu Nov 11 18:10:35 CET 2021


Rich,

Boxplots like many other things in ggplot can be grouped in various ways. I
often do something like this:

Say I have a data.frame with columns called PLACE and MEASURE others. The
one I call PLACE would be a factor containing the locations you are
measuring at. I mean it would be character strings of your N places but the
factors would be made in the order you want the results in. The MEASURE
variable in each row would contain one of the many measures at that
location. You probably would have other columns like DATE.

To display multiple boxplots subdivided by place is as easy as using the
phrase in an aes() clause like:

	ggplot(your_data, aes(..., color=PLACE)) + geom_boxplot()

There are other variants or using group= but it works fine usually for up to
six places as colors often get recycled.

My impression is that you want to sort of force ggplot to keep forgetting
earlier data and commence anew on new data and other attributes. Yes, there
is a way to ask ggplot to not inherit things from step to step and take new
instructions BUT you may not be aware of how different the ggplot paradigm
is compared to other graphics engines.

Some of the other plot repeatedly as you go along and new layers sort of
overlay old layers. Ggplot makes no plots whatsoever. It creates a huge
complex object and populates it as it goes along. Some changes in setting
simply overwrite earlier ones that are then gone. The plotting can be done
at any later time (or automatically when it finishes) using one of the
methods known to print(). So some of what you want may not work well as it
normally does not want to store multiple data sets.

In one sense, I would say the ggplot way is to focus on getting your data
into the right shape. In your case, have you considered reading in your
multiple data items into df1 through df4 or whatever and making changes so
each has a new column called something like PLACE that is the same for all
items in df1 and another for df2 and so on?

When you have done that and made all the dfN have the same names and numbers
for columns, you can combine them into one df_combined using something like
rbind().

You can then change the column in df_combined called PLACE to be a factor of
itself in the order you want based on the compass.

What you have then can be given to ggplot as described above. Note in some
places ggplot sees a factor in the order it is sequenced, i.e. it may see it
as containing a 1  and 2 and so on. So the easiest way to make it do some
things is before you call it. Somewhat more advanced users can do odd things
in midstream like y=refactor(something) with ggplot.



-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Rich Shepard
Sent: Thursday, November 11, 2021 8:50 AM
To: r-help using r-project.org
Subject: Re: [R] ggplot2: multiple box plots, different tibbles/dataframes

On Wed, 10 Nov 2021, Avi Gross via R-help wrote:

> I think many here may not quite have enough info to help you.

Avi,

Actually, you've reflected my thinking.

> But the subject of multiple plots has come up. There are a slew of 
> ways, especially in the ggplot paradigm, to make multiple smaller 
> plots into a larger display showing them in some number of rows and 
> columns, or other ways. Some methods use facet_wrap() or facet_grid() 
> type functionality that let you plot multiple subdivisions of the data 
> independently. These though generally have to be in some way related.

My experience with facets (which I belive are like latice's conditioned
trellis plots has each plot in a separate frame in a row, column, or
matrix.) That won't communicate what I want viewers to see as well as would
having all in a single frame.

My data represent hydrologic and geochemical conditions at four locations
along the mainstem of a river. While the period of record for each
monitoring gauge is different, I want to illustrate how highly variable
conditions are at each location. The major factor of interest is discharge,
the volume of water passing a river cross section at the gauge location in
cubic feet per second. I have created boxplots for each site representing
the distribution of discharge for the entire data set and I'd like to place
each of the four horizontal boxplots stacked vertically with the
southern-most at the bottom and the northern-most at the top (the river
flows north).

> Yet others let you make many independent graphs and save them and 
> later recombine them in packages like cowplot.

I discovered cowplot yesterday but haven't yet read the PDF or vignette.

> So, although it may also be possible to do whatever it is you want 
> within a single plot, it may also make sense to do it as loosely described
above.

While I certainly may be wrong, I believe that seeing all four boxplots in
the same frame makes the differences in distribution most clear.

Thanks,

Rich

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list