[R] Including percentage values inside columns of a histogram

Tue Aug 17 22:58:10 CEST 2021

Ah yes. Duhhh...  Thanks Rui.

So h$density *diff(h$breaks) *100 will give the percentages. No need
for arithmetic beyond that.

Bert

On Tue, Aug 17, 2021 at 12:03 PM Rui Barradas <ruipbarradas using sapo.pt> wrote:
>
> Hello,
>
>
>
> Às 19:28 de 17/08/21, Bert Gunter escreveu:
> > Inline below.
> >
> >
> >
> > On Tue, Aug 17, 2021 at 4:09 AM Rui Barradas <ruipbarradas using sapo.pt> wrote:
> >>
> >> Hello,
> >>
> >> I had forgotten about plot.histogram, it does make everything simpler.
> >> To have percentages on the bars, in the code below I use package scales.
> >>
> >> Note that it seems to me that you do not want densities, to have
> >> percentages,  the proportions of counts are given by any of
> >
> > Under the default of equal width bins -- which is what Sturges gives
>
> Right.
>
> > if I read the docs correctly -- since the densities sum to 1,
>
> The "densities" do not sum to 1. From ?hist, section Value:
>
> density
> values f^(x[i]), as estimated density values. If all(diff(breaks) == 1),
> they are the relative frequencies counts/n and in general satisfy
> sum[i; f^(x[i]) (b[i+1]-b[i])] = 1, where b[i] = breaks[i].
>
>
> If all(diff(breaks) == 1) is FALSE, the density list member must be
> multiplied by diff(.$breaks)
>
>
> h <- hist(datasetregs$Amount, plot = FALSE)
> sum(h$density)
> #[1] 1e-04
> diff(h$breaks)
> #[1] 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000
> sum(h$density*diff(h$breaks))
> #[1] 1
>
>
> Hope this helps,
>
> Rui Barradas
>
> they are
> > already the proportion of counts in each histogram bin, no?
> >
> > -- Bert
> >
> >
> >>
> >> h$counts/sum(h$counts)
> >> h$density*diff(h$breaks)
> >>
> >>
> >>
> >> # One histogram for all dates
> >> h <- hist(datasetregs$Amount, plot = FALSE)
> >> plot(h, labels = scales::percent(h$counts/sum(h$counts)),
> >>        ylim = c(0, 1.1*max(h$counts)))
> >>
> >>
> >>
> >> # Histograms by date
> >> sp <- split(datasetregs, datasetregs$Date)
> >> old_par <- par(mfrow = c(1, 3))
> >> h_list <- lapply(seq_along(sp), function(i){
> >>     hist_title <- paste("Histogram of", names(sp)[i])
> >>     h <- hist(sp[[i]]$Amount, plot = FALSE)
> >>     plot(h, main = hist_title, xlab = "Amount",
> >>          labels = scales::percent(h$counts/sum(h$counts)),
> >>          ylim = c(0, 1.1*max(h$counts)))
> >> })
> >> par(old_par)
> >>
> >>
> >> Hope this helps,
> >>
> >> Rui Barradas
> >>
> >> Às 01:49 de 17/08/21, Bert Gunter escreveu:
> >>> I may well misunderstand, but proffered solutions seem more complicated
> >>> than necessary.
> >>> Note that the return of hist() can be saved as a list of class "histogram"
> >>> and then plotted with  plot.histogram(), which already has a "labels"
> >>> argument that seems to be what you want. A simple example is"
> >>>
> >>> dat <- runif(50, 0, 10)
> >>> myhist <- hist(dat, freq = TRUE, breaks ="Sturges")
> >>>
> >>> plot(myhist, col = "darkgray",
> >>>        labels = as.character(round(myhist$density*100,1) ),
> >>>        ylim = c(0, 1.1*max(myhist$counts)))
> >>> ## note that this is plot.histogram because myhist has class "histogram"
> >>>
> >>> Note that I expanded the y axis a bit to be sure to include the labels. You
> >>> can, of course, plot your separate years as Rui has indicated or via e.g.
> >>> ?layout.
> >>>
> >>> Apologies if I have misunderstood. Just ignore this in that case.
> >>> Otherwise, I leave it to you to fill in details.
> >>>
> >>> Bert Gunter
> >>>
> >>> "The trouble with having an open mind is that people keep coming along and
> >>> sticking things into it."
> >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >>>
> >>>
> >>> On Mon, Aug 16, 2021 at 4:14 PM Paul Bernal <paulbernal07 using gmail.com> wrote:
> >>>
> >>>> Dear Jim,
> >>>>
> >>>> Thank you so much for your kind reply. Yes, this is what I am looking for,
> >>>> however, can´t see clearly how the bars correspond to the bins in the
> >>>> x-axis. Maybe there is a way to align the amounts so that they match the
> >>>> columns, sorry if I sound picky, but just want to learn if there is a way
> >>>> to accomplish this.
> >>>>
> >>>> Best regards,
> >>>>
> >>>> Paul
> >>>>
> >>>> El lun, 16 ago 2021 a las 17:57, Jim Lemon (<drjimlemon using gmail.com>)
> >>>> escribió:
> >>>>
> >>>>> Hi Paul,
> >>>>> I just worked out your first request:
> >>>>>
> >>>>> datasetregs<-<-structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
> >>>>> 2L,
> >>>>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
> >>>>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
> >>>>> 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
> >>>>> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
> >>>>> 3L, 3L, 3L), .Label = c("AF 2017", "AF 2020", "AF 2021"), class =
> >>>>> "factor"),
> >>>>>       Amount = c(40100, 101100, 35000, 40100, 15000, 45100, 40200,
> >>>>>       15000, 35000, 35100, 20300, 40100, 15000, 67100, 17100, 15000,
> >>>>>       15000, 50100, 35100, 15000, 15000, 15000, 15000, 15000, 15000,
> >>>>>       15000, 15000, 15000, 15000, 15000, 15000, 15000, 15000, 15000,
> >>>>>       15000, 15000, 20100, 15000, 15000, 15000, 15000, 15000, 15000,
> >>>>>       16600, 15000, 15000, 15700, 15000, 15000, 15000, 15000, 15000,
> >>>>>       15000, 15000, 15000, 15000, 20200, 21400, 25100, 15000, 15000,
> >>>>>       15000, 15000, 15000, 15000, 25600, 15000, 15000, 15000, 15000,
> >>>>>       15000, 15000, 15000, 15000)), row.names = c(NA, -74L), class =
> >>>>> "data.frame")
> >>>>> histval<-with(datasetregs, hist(Amount, groups=Date, scale="frequency",
> >>>>>    breaks="Sturges", col="darkgray"))
> >>>>> library(plotrix)
> >>>>> histpcts<-paste0(round(100*histval$counts/sum(histval$counts),1),"%")
> >>>>> barlabels(histval$mids,histval$counts,histpcts)
> >>>>>
> >>>>> I think that's what you asked for:
> >>>>>
> >>>>> Jim
> >>>>>
> >>>>> On Tue, Aug 17, 2021 at 8:44 AM Paul Bernal <paulbernal07 using gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> This is way better, now, how could I put the frequency labels in the
> >>>>>> columns as a percentage, instead of presenting them as counts?
> >>>>>>
> >>>>>> Thank you so much.
> >>>>>>
> >>>>>> Paul
> >>>>>>
> >>>>>> El lun, 16 ago 2021 a las 17:33, Rui Barradas (<ruipbarradas using sapo.pt>)
> >>>>>> escribió:
> >>>>>>
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> You forgot to cc the list.
> >>>>>>>
> >>>>>>> Here are two ways, both of them apply hist() and text() to Amount
> >>>> split
> >>>>>>> by Date. The return value of hist is saved because it's a list with
> >>>>>>> members the histogram's bars midpoints and the counts. Those are used
> >>>>> to
> >>>>>>> know where to put the text labels.
> >>>>>>> A vector lbls is created to get rid of counts of zero.
> >>>>>>>
> >>>>>>> The main difference between the two ways is the histogram's titles.
> >>>>>>>
> >>>>>>>
> >>>>>>> old_par <- par(mfrow = c(1, 3))
> >>>>>>> h_list <- with(datasetregs, tapply(Amount, Date, function(x){
> >>>>>>>      h <- hist(x)
> >>>>>>>      lbls <- ifelse(h$counts == 0, NA_integer_, h$counts)
> >>>>>>>      text(h$mids, h$counts/2, labels = lbls)
> >>>>>>> }))
> >>>>>>> par(old_par)
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> old_par <- par(mfrow = c(1, 3))
> >>>>>>> sp <- split(datasetregs, datasetregs$Date)
> >>>>>>> h_list <- lapply(seq_along(sp), function(i){
> >>>>>>>      hist_title <- paste("Histogram of", names(sp)[i])
> >>>>>>>      h <- hist(sp[[i]]$Amount, main = hist_title)
> >>>>>>>      lbls <- ifelse(h$counts == 0, NA_integer_, h$counts)
> >>>>>>>      text(h$mids, h$counts/2, labels = lbls)
> >>>>>>> })
> >>>>>>> par(old_par)
> >>>>>>>
> >>>>>>>
> >>>>>>> Hope this helps,
> >>>>>>>
> >>>>>>> Rui Barradas
> >>>>>>>
> >>>>>>> Às 23:16 de 16/08/21, Paul Bernal escreveu:
> >>>>>>>> Dear Rui,
> >>>>>>>>
> >>>>>>>> The hist() function comes from the graphics package, from what I
> >>>>> could
> >>>>>>>> see. The thing is that I want to divide the Amount column into
> >>>>> several
> >>>>>>>> bins and then generate three different histograms, one for each AF
> >>>>>>>> period (AF refers to fiscal years). As you can see, the data
> >>>> contains
> >>>>>>>> three fiscal years (2017, 2020 and 2021). I want to see the
> >>>>> percentage
> >>>>>>>> of cases that fall into different amount categories, from 15,000
> >>>> and
> >>>>>>>> below, 16,000 to 17,000, from 18,000 to 19,000, and so on.
> >>>>>>>>
> >>>>>>>> Thanks for your kind help.
> >>>>>>>>
> >>>>>>>> Paul
> >>>>>>>>
> >>>>>>>> El lun, 16 ago 2021 a las 17:07, Rui Barradas (<
> >>>> ruipbarradas using sapo.pt
> >>>>>>>> <mailto:ruipbarradas using sapo.pt>>) escribió:
> >>>>>>>>
> >>>>>>>>       Hello,
> >>>>>>>>
> >>>>>>>>       The function Hist comes from what package?
> >>>>>>>>
> >>>>>>>>       Are you sure you don't want a bar plot?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>       agg <- aggregate(Amount ~ Date, datasetregs, sum)
> >>>>>>>>       bp <- barplot(Amount ~ Date, agg)
> >>>>>>>>       with(agg, text(bp, Amount/2, labels = Amount))
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>       Hope this helps,
> >>>>>>>>
> >>>>>>>>       Rui Barradas
> >>>>>>>>
> >>>>>>>>       Às 22:54 de 16/08/21, Paul Bernal escreveu:
> >>>>>>>>        > Hello everyone,
> >>>>>>>>        >
> >>>>>>>>        > I am currently working with R version 4.1.0 and I am trying
> >>>> to
> >>>>>>>>       include
> >>>>>>>>        > (inside the columns of the histogram), the percentage
> >>>>>>>>       distribution and I
> >>>>>>>>        > want to generate three histograms, one for each fiscal year
> >>>>> (in
> >>>>>>>>       the Date
> >>>>>>>>        > column, there are three fiscal year AF 2017, AF 2020 and AF
> >>>>>>>>       2021). However,
> >>>>>>>>        > I can´t seem to accomplish this.
> >>>>>>>>        >
> >>>>>>>>        > Here is my data:
> >>>>>>>>        >
> >>>>>>>>        > structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
> >>>> 2L,
> >>>>>>>>        > 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
> >>>>> 2L,
> >>>>>>>>        > 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
> >>>>> 2L,
> >>>>>>>>        > 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
> >>>>> 3L,
> >>>>>>>>        > 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
> >>>>> 3L,
> >>>>>>>>        > 3L, 3L, 3L), .Label = c("AF 2017", "AF 2020", "AF 2021"),
> >>>>> class =
> >>>>>>>>        > "factor"),
> >>>>>>>>        >      Amount = c(40100, 101100, 35000, 40100, 15000, 45100,
> >>>>> 40200,
> >>>>>>>>        >      15000, 35000, 35100, 20300, 40100, 15000, 67100, 17100,
> >>>>>>> 15000,
> >>>>>>>>        >      15000, 50100, 35100, 15000, 15000, 15000, 15000, 15000,
> >>>>>>> 15000,
> >>>>>>>>        >      15000, 15000, 15000, 15000, 15000, 15000, 15000, 15000,
> >>>>>>> 15000,
> >>>>>>>>        >      15000, 15000, 20100, 15000, 15000, 15000, 15000, 15000,
> >>>>>>> 15000,
> >>>>>>>>        >      16600, 15000, 15000, 15700, 15000, 15000, 15000, 15000,
> >>>>>>> 15000,
> >>>>>>>>        >      15000, 15000, 15000, 15000, 20200, 21400, 25100, 15000,
> >>>>>>> 15000,
> >>>>>>>>        >      15000, 15000, 15000, 15000, 25600, 15000, 15000, 15000,
> >>>>>>> 15000,
> >>>>>>>>        >      15000, 15000, 15000, 15000)), row.names = c(NA, -74L),
> >>>>> class
> >>>>>>> =
> >>>>>>>>        > "data.frame")
> >>>>>>>>        >
> >>>>>>>>        > I would like to modify the following script:
> >>>>>>>>        >
> >>>>>>>>        >> with(datasetregs, Hist(Amount, groups=Date,
> >>>>> scale="frequency",
> >>>>>>>>        > +   breaks="Sturges", col="darkgray"))
> >>>>>>>>        >
> >>>>>>>>        > #The only thing missing here are the percentages
> >>>>> corresponding to
> >>>>>>>>       each bin
> >>>>>>>>        > (I would like to see the percentages inside each column, or
> >>>> on
> >>>>>>>>       top outside
> >>>>>>>>        > if possible)
> >>>>>>>>        >
> >>>>>>>>        > Any help will be greatly appreciated.
> >>>>>>>>        >
> >>>>>>>>        > Best regards,
> >>>>>>>>        >
> >>>>>>>>        > Paul.
> >>>>>>>>        >
> >>>>>>>>        >       [[alternative HTML version deleted]]
> >>>>>>>>        >
> >>>>>>>>        > ______________________________________________
> >>>>>>>>        > R-help using r-project.org <mailto:R-help using r-project.org> mailing
> >>>>> list
> >>>>>>>>       -- To UNSUBSCRIBE and more, see
> >>>>>>>>        > https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>>>>       <https://stat.ethz.ch/mailman/listinfo/r-help>
> >>>>>>>>        > PLEASE do read the posting guide
> >>>>>>>>       http://www.R-project.org/posting-guide.html
> >>>>>>>>       <http://www.R-project.org/posting-guide.html>
> >>>>>>>>        > and provide commented, minimal, self-contained, reproducible
> >>>>> code.
> >>>>>>>>        >
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>           [[alternative HTML version deleted]]
> >>>>>>
> >>>>>> ______________________________________________
> >>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>> PLEASE do read the posting guide
> >>>>> http://www.R-project.org/posting-guide.html
> >>>>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>>
> >>>>
> >>>>           [[alternative HTML version deleted]]
> >>>>
> >>>> ______________________________________________
> >>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting guide
> >>>> http://www.R-project.org/posting-guide.html
> >>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>
> >>>
> >>>        [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>