[R] Newbie: Controlling legends in graphs

Rui Barradas ru|pb@rr@d@@ @end|ng |rom @@po@pt
Fri May 12 22:06:22 CEST 2023


Às 14:24 de 12/05/2023, Kevin Zembower via R-help escreveu:
> Hello, I'm trying to create a line graph with a legend, but have no
> success controlling the legend. Since nothing I've tried seems to work,
> I must be doing something systematically wrong. Can anyone point this
> out to me?
> 
> Here's my data:
>   > weights
> # A tibble: 1,246 × 3
>      Date           J     K
>      <date>     <dbl> <dbl>
>    1 2000-02-13   133  188
>    2 2000-02-20   134  185
>    3 2000-02-27   135  187
>    4 2000-03-05   135  185
>    5 2000-03-12    NA  184
>    6 2000-03-19    NA  184.
>    7 2000-03-26   136  184.
>    8 2000-04-02   134  185
>    9 2000-04-09   133  186
> 10 2000-04-16    NA  186
> # ℹ 1,236 more rows
> # ℹ Use `print(n = ...)` to see more rows
>   >
> 
> Here's my attempts. You can see some of the things I've tried in the
> commented out sections:
> weights %>%
>       group_by(year(Date)) %>%
>       summarize(
>           m_K = mean(K, na.rm = TRUE),
>           m_J = mean(J, na.rm = TRUE),
>           ) %>%
>       ggplot(aes(x = `year(Date)`)) +
>       geom_point(aes(y = m_K, color = "red")) +
>       geom_smooth(aes(y = m_K, color = "red")) +
>       geom_point(aes(y = m_J, color = "blue")) +
>       geom_smooth(aes(y = m_J, color = "blue")) +
>       guides(size = "legend",
>              shape = "legend")
>       ## scale_shape_discrete(name="Person",
>       ##                      breaks=c("m_K", "m_J"),
>       ##                      labels=c("K", "J"))
>       ## theme(legend.title=element_blank())
> 
> When this runs, the blue line for "K" is above the red line for "J", as
> I expect, but in the legend, the red is shown first, and labeled "blue."
> 
> I'd like to be able to create a legend where the first entry shows a
> blue line and is labeled "K" and the second is red and labeled "J".
> 
> On a different but related topic, I'd welcome any advice or suggestions
> on my methodology in this example. Is this the correct way to summarize
> with a mean? Do I need the two sets of geom_point and geom_line clauses
> to create this graph, or is there a better way?
> 
> Thanks for all your advice and guidance.
> 
> -Kevin
> 
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Hello,

This is mainly a data reshaping problem. Insteadof plotting two 
variables, J and K, if the data is in the long format you will map the 
column with these variables names to the color aesthetic and call each 
geom_* only once. Then, assign the colors you want.

As for placing K above J, note that ggplot places them by alphabetical 
order unless you coerce to factor with the levels in the order you want.

Also, if you want to compute aggregate statistics for several columns, 
use ?across. See the code below.

Here is a complete example. I have augmented your data set in order to 
have more years to plot.



# augment the data set
weights <- " Date           J     K
   1 2000-02-13   133  188
   2 2000-02-20   134  185
   3 2000-02-27   135  187
   4 2000-03-05   135  185
   5 2000-03-12    NA  184
   6 2000-03-19    NA  184.
   7 2000-03-26   136  184.
   8 2000-04-02   134  185
   9 2000-04-09   133  186
10 2000-04-16    NA  186"
weights <- read.table(text = weights, header = TRUE)
weights$Date <- as.Date(weights$Date)
tmp <- weights
tmp <- lapply(1:10, \(y) {
   tmp$Date <- years(y) + tmp$Date
   tmp$J <- tmp$J + sample(-10:10, nrow(weights), TRUE)
   tmp$K <- tmp$K + sample(-10:10, nrow(weights), TRUE)
   tmp
})
weights <- do.call(rbind, tmp)

#---

# plot code
library(ggplot2)
library(dplyr)
library(tidyr)
library(lubridate)

weights %>%
     mutate(Year = year(Date)) %>%
     group_by(Year) %>%
     summarize(across(J:K, mean, na.rm = TRUE)) %>%
     # now reshape the data
     pivot_longer(-Year) %>%
     # uncomment the next line if you want K
     # to show up on top in the legend
     # mutate(name = factor(name, levels = c("K", "J"))) %>%
     ggplot(aes(Year, value, color = name)) +
     geom_smooth(
         formula = y ~ x,
         method = lm,
         se = FALSE
     ) +
     geom_point() +
     scale_color_manual(values = c(J = "red", K = "blue"))



Hope this helps,

Rui Barradas



More information about the R-help mailing list