[R] Grouped boxplots using ggplot() from ggplot2.

Jeff Newmiller jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@
Sat Jul 28 16:54:15 CEST 2018


1) I don't know... it looks to me like you did not run my code. I have 
included a complete reprex below... try it out in a fresh session. If you 
still get the problem, check your sessionInfo package versions against 
mine.

2) This still smells like your fill parameter is inside the aes function 
with Type as value. This causes a legend to be created, and since that 
legend has a different name ("Type") than the colour scale, they are 
separated. Confirm that you are using fill outside the aes function 
(because you don't want fill to depend on the data) and have the constant 
NULL as value (so it won't generate any fill graphical representation).

3) I missed that... the ylim()/scales_y_continuous(breaks=) limits 
constrain which data are included as input into the graph. The 
coord_cartesian function forces the limits as desired.

4) While showing outliers is a standard semantic feature of boxplots 
whether produced by ggplot or lattice or base or non-R solution, you can 
please the client by making the outliers transparent.

There is a link to the generated image below.

################
# Simulate some data:
Type <- rep( c( "National", "Local" ), each = 250 )
M0   <- 1300+50*(0:4)
set.seed( 42 )
M1   <- M0 + runif( 5, -100, -50 )
X0   <- rnorm( 250, rep( M0, each = 50 ), 150 )
X1   <- rnorm( 250, rep( M1, each = 50 ), 100 )

library(ggplot2)
Year <- factor( rep( 4:8, each = 50, times = 2)
               , levels = 0:8 )
DemoDat <- data.frame( Year = Year
                      , Score = c( X0, X1 )
                      , Type = Type
                      )

ggplot( data = DemoDat
       , aes( x = Year
            , y = Score
            , color = Type
            )
       , fill = NULL
       ) +
     geom_boxplot( position = position_dodge( 1 )
                 , outlier.alpha = 0
                 ) +
     theme_minimal() +
     scale_colour_manual( name = "National v. Local"
                        , values = c( "red", "black" ) ) +
     scale_x_discrete( drop = FALSE ) +
     scale_y_continuous( breaks=seq( 700, 2100, 100 ) ) +
     coord_cartesian( ylim = c( 700, 2100 ) )

# ![](https://i.imgur.com/wUVYU5H.png)

#' Created on 2018-07-28 by the [reprex package](http://reprex.tidyverse.org) (v0.2.0).
################


> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C 
LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8 
LC_PAPER=en_US.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C 
LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] ggplot2_3.0.0

loaded via a namespace (and not attached):
  [1] Rcpp_0.12.17     pillar_1.2.3     compiler_3.4.4   plyr_1.8.4 
bindr_0.1.1      tools_3.4.4
  [7] digest_0.6.15    memoise_1.1.0    evaluate_0.10.1  tibble_1.4.2 
gtable_0.2.0     debugme_1.1.0
[13] pkgconfig_2.0.1  rlang_0.2.1      reprex_0.2.0     rstudioapi_0.7 
yaml_2.1.19      bindrcpp_0.2.2
[19] stringr_1.3.1    withr_2.1.2      dplyr_0.7.6      knitr_1.20 
devtools_1.13.6  rprojroot_1.3-2
[25] grid_3.4.4       tidyselect_0.2.4 glue_1.2.0       R6_2.2.2 
processx_3.1.0   rmarkdown_1.10
[31] clipr_0.4.1      purrr_0.2.5      callr_2.0.4      magrittr_1.5 
whisker_0.3-2    scales_0.5.0
[37] backports_1.1.2  htmltools_0.3.6  assertthat_0.2.0 colorspace_1.3-2 
stringi_1.2.3    lazyeval_0.2.1
[43] munsell_0.5.0    crayon_1.3.4



On Sat, 28 Jul 2018, Rolf Turner wrote:

>
> On 28/07/18 17:03, Jeff Newmiller wrote:
>
>> When you understand the strong dependence on how the data controls ggplot, 
>> using it gets much easier. I still have to google details sometimes though. 
>> Note that it can be very difficult to make a weird plot (e.g. multiple 
>> parallel axes) in ggplot because it is very internally consistent... a 
>> blessing and a curse.
>> 
>> 1) Colour is assigned in the scale according to order of levels of the 
>> factor. Note that while they are both discrete, the so-called "discrete" 
>> scales auto-colour, but "manual" scales require you to specify the exact 
>> colour sequence.
>> 
>> 2) Assigning constants to properties is done outside the mapping (aes). 
>> Note that "colour" is for lines and shapes outlines, while "fill" is colour 
>> meant to fill in shapes. When the names of these two scales are the same 
>> and the values are the same, the legends will merge. If not, they will be 
>> shown separately.
>> 
>> 3) Discrete scales are controlled by the levels in the data. To prevent 
>> ggplot from removing missing levels, use the drop=FALSE argument.
>> 
>> 4) Breaks are a property of the scale.
>> 
>> My changes were:
>> 
>> Year <- factor( rep( 4:8, each = 50, times = 2 ), levels = 0:8 )
>> DemoDat <- data.frame(Year = Year, Score = c( X0 , X1 ), Type = Type )
>> 
>> ggplot( data = DemoDat
>>        , aes( x = Year, y = Score, color = Type )
>>        , fill = NULL
>>        ) +
>>      geom_boxplot( position = position_dodge(1) ) +
>>      theme_minimal() +
>>      scale_colour_manual( name = "National v. Local"
>>                         , values = c( "red", "black" ) ) +
>>      scale_x_discrete( drop = FALSE ) +
>>      scale_y_continuous( breaks = seq( 700, 2100, 100 ) )
>> 
>> Good luck with your graphics grammar!
>
> Dear Jeff,
>
> Thanks very much for this cogent advice and for taking the trouble to steer 
> me in the right direction.  However I am not quite out of the woods yet.
>
> (1) I'm still getting two legends.  How do I stop this from happening?
>
> (2) The boxes are "filled" (with pinkish and blueish colours --- which are 
> referenced in the second of the two legends that I get).  How can I get 
> "unfilled" boxes?
>
> (3) The y-axis scale runs only from 800 to 1800, rather than from 700 to 
> 2100.  How can I force it to run from 700 to 2100?
>
> (4) With the modified code we now get some "outliers" (points beyond the 
> whisker tips) plotted --- which I didn't get before (and don't want, because 
> "last year's" graphics did not include outliers).  How can I suppress the 
> plotting of outliers?
>
> I have attached a pdf containing the results of running the code that
> you provided, so that you can readily see what is happening.
>
> May I prevail upon your good graces to enlighten me about questions
> (1) --- (4) above?
>
> Ever so humbly grateful.
>
> cheers,
>
> Rolf
>
> -- 
> Technical Editor ANZJS
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
>

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil using dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
---------------------------------------------------------------------------


More information about the R-help mailing list