[R] ggplot: add percentage for each element in legend and remove tick mark

Avi Gross @v|gro@@ @end|ng |rom ver|zon@net
Sat Aug 14 03:29:15 CEST 2021


Kai,

It is easier to want to help someone if they generally know what they are doing and are stuck on something. Less so when they do not know enough to explain to us what they want, show what they did, and so on.

I modified the data you showed and hopefully it can be recreated this way:

library(tidyverse)

df <- tribble(
  ~ethnicity, ~individuals,
  "Caucasian", 36062,
  "Ashkenazi Jewish", 4309,
  "Multiple", 3193,
  "Hispanic", 2113,
  "Asian. not specified", 1538,
  "Chinese", 1031,
  "African", 643,
  "Unknown", 510,
  "Filipino", 222,
  "Japanese", 129,
  "Native American", 116,
  "Indian", 111,
  "Pacific Islander", 23)

If it was not clear, assuming you already had your data in some variable with a name, like my df, you could do this:

> dput(df)
structure(list(
  ethnicity = c(
    "Caucasian",
    "Ashkenazi Jewish",
    "Multiple",
    "Hispanic",
    "Asian. not specified",
    "Chinese",
    "African",
    "Unknown",
    "Filipino",
    "Japanese",
    "Native American",
    "Indian",
    "Pacific Islander"
  ),
  individuals = c(36062, 4309, 3193, 2113,
                  1538, 1031, 643, 510, 222, 129, 116, 111, 23)
), row.names = c(NA,
                 -13L), class = c("tbl_df", "tbl", "data.frame"))   

The above structure can be used to recreate the data somewhat portably including a cut and paste like this:

Restoring <- the.above.put.here

The question you ask may better be answered by CHANGING what is in df before calling ggplot.

Be that as it may, with lotf of work on your badly formatted code as shown in plain text, I have this:

> eth
# A tibble: 13 x 5
ethnicity            individuals fraction  ymax  ymin
<chr>                      <dbl>    <dbl> <dbl> <dbl>
  1 Caucasian                  36062  0.721   0.721 0    
2 Ashkenazi Jewish            4309  0.0862  0.807 0.721
3 Multiple                    3193  0.0639  0.871 0.807
4 Hispanic                    2113  0.0423  0.914 0.871
5 Asian. not specified        1538  0.0308  0.944 0.914
6 Chinese                     1031  0.0206  0.965 0.944
7 African                      643  0.0129  0.978 0.965
8 Unknown                      510  0.0102  0.988 0.978
9 Filipino                     222  0.00444 0.992 0.988
10 Japanese                     129  0.00258 0.995 0.992
11 Native American              116  0.00232 0.997 0.995
12 Indian                       111  0.00222 1.00  0.997
13 Pacific Islander              23  0.00046 1     1.00

I used your ggplot code, reformatted so people can read and run it as:

ggplot(eth, aes(ymax=ymax, ymin=ymin, xmax=4, xmin=3, fill=ethnicity)) +
  geom_rect() +
  coord_polar(theta="y")  +
  xlim(c(2, 4))

It shows  donut plot I am not sure I can easily share here. You want to change the legend by adding more. Sure, tons of ways to do that BUT not sure what you actually want. 

ONE WAY to do what you want is to make a new column like this:

> eth$label <- paste(eth$ethnicity, " ", eth$fraction*100, "%", sep="")
> eth
# A tibble: 13 x 6
ethnicity            individuals fraction  ymax  ymin label                      
<chr>                      <dbl>    <dbl> <dbl> <dbl> <chr>                      
  1 Caucasian                  36062  0.721   0.721 0     Caucasian 72.124%          
2 Ashkenazi Jewish            4309  0.0862  0.807 0.721 Ashkenazi Jewish 8.618%    
3 Multiple                    3193  0.0639  0.871 0.807 Multiple 6.386%            
4 Hispanic                    2113  0.0423  0.914 0.871 Hispanic 4.226%            
5 Asian. not specified        1538  0.0308  0.944 0.914 Asian. not specified 3.076%
6 Chinese                     1031  0.0206  0.965 0.944 Chinese 2.062%             
7 African                      643  0.0129  0.978 0.965 African 1.286%             
8 Unknown                      510  0.0102  0.988 0.978 Unknown 1.02%              
9 Filipino                     222  0.00444 0.992 0.988 Filipino 0.444%            
10 Japanese                     129  0.00258 0.995 0.992 Japanese 0.258%            
11 Native American              116  0.00232 0.997 0.995 Native American 0.232%     
12 Indian                       111  0.00222 1.00  0.997 Indian 0.222%              
13 Pacific Islander              23  0.00046 1     1.00  Pacific Islander 0.046%

Now once you make the labels look like the exact way you want, you need to ask ggplot to substitute your labels, and make sure they line up right. It may be tricky and may require making factors properly. You may also want to round the percentages to all be the same. You can also use scale_fill_discrete to change other things like replace "ethnicity" with another phrase and so on.

Here is the additional part of ggplot that makes the change:

ggplot(eth, aes(ymax=ymax, ymin=ymin, xmax=4, xmin=3, fill=ethnicity)) +
  geom_rect() +
  coord_polar(theta="y")  +
  xlim(c(2, 4)) +
  scale_fill_discrete( labels = eth$label)

Removing the tick mark text can be done by setting the right elements of a theme as in the following:

ggplot(eth, aes(ymax=ymax, ymin=ymin, xmax=4, xmin=3, fill=ethnicity)) +
  geom_rect() +
  coord_polar(theta="y")  +
  xlim(c(2, 4)) +
  scale_fill_discrete( labels = eth$label) +
  theme(axis.ticks = element_blank(),
        axis.text = element_blank())

Only one of the two above is actually needed, and you can experiment.

I can send you personally an attachment showing the output as this is a text only setup.




-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Kai Yang via R-help
Sent: Friday, August 13, 2021 5:48 PM
To: John Kane <jrkrideau using gmail.com>
Cc: R-help Mailing List <r-help using r-project.org>
Subject: Re: [R] ggplot: add percentage for each element in legend and remove tick mark

 Hello John,
I put my testing data below. I'm not sure how to use dupt() function. would you please give me an example?
Thanks,
Kai

| 
ethnicity |
individuals |
| Caucasian | 36062 |
| Ashkenazi Jewish | 4309 |
| Multiple | 3193 |
| Hispanic | 2113 |
| Asian. not specified | 1538 |
| Chinese | 1031 |
| African | 643 |
| Unknown | 510 |
| Filipino | 222 |
| Japanese | 129 |
| Native American | 116 |
| Indian | 111 |
| Pacific Islander | 23 |



    On Friday, August 13, 2021, 06:21:29 AM PDT, John Kane <jrkrideau using gmail.com> wrote:  
 
 Would you supply some sample data please? A handy way to supply sample data is to use the dput() function. See ?dput.  If you have a very large data set then something like head(dput(myfile), 100) will likely supply enough data for us to work with.

On Thu, 12 Aug 2021 at 11:45, Kai Yang via R-help <r-help using r-project.org> wrote:
>
> Hello List,
> I use the following code to generate a donut plot.
> # Compute percentages
> eth$fraction = eth$individuals / sum(eth$individuals)  # Compute the 
>cumulative percentages (top of each rectangle)  eth$ymax = 
>cumsum(eth$fraction)  # Compute the bottom of each rectangle  eth$ymin 
>= c(0, head(eth$ymax, n=-1))  # Make the plot using percentage  
>ggplot(eth, aes(ymax=ymax, ymin=ymin, xmax=4, xmin=3, fill=ethnicity)) 
>+
>  geom_rect() +
>  coord_polar(theta="y")  +
>  xlim(c(2, 4)
>  )
>
> I want to improve the plot for two thing:
> 1. the legend: I need to add percentage (eth$fraction * 100 and then add %) for each of element.
> 2. remove all number (tick mark ?) around the plot Please help Thank 
> you, Kai
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
John Kane
Kingston ON Canada
  
	[[alternative HTML version deleted]]

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list