[R] Including percentage values inside columns of a histogram

Rui Barradas ru|pb@rr@d@@ @end|ng |rom @@po@pt
Tue Aug 17 13:09:49 CEST 2021


Hello,

I had forgotten about plot.histogram, it does make everything simpler.
To have percentages on the bars, in the code below I use package scales.

Note that it seems to me that you do not want densities, to have 
percentages,  the proportions of counts are given by any of

h$counts/sum(h$counts)
h$density*diff(h$breaks)



# One histogram for all dates
h <- hist(datasetregs$Amount, plot = FALSE)
plot(h, labels = scales::percent(h$counts/sum(h$counts)),
      ylim = c(0, 1.1*max(h$counts)))



# Histograms by date
sp <- split(datasetregs, datasetregs$Date)
old_par <- par(mfrow = c(1, 3))
h_list <- lapply(seq_along(sp), function(i){
   hist_title <- paste("Histogram of", names(sp)[i])
   h <- hist(sp[[i]]$Amount, plot = FALSE)
   plot(h, main = hist_title, xlab = "Amount",
        labels = scales::percent(h$counts/sum(h$counts)),
        ylim = c(0, 1.1*max(h$counts)))
})
par(old_par)


Hope this helps,

Rui Barradas

Às 01:49 de 17/08/21, Bert Gunter escreveu:
> I may well misunderstand, but proffered solutions seem more complicated
> than necessary.
> Note that the return of hist() can be saved as a list of class "histogram"
> and then plotted with  plot.histogram(), which already has a "labels"
> argument that seems to be what you want. A simple example is"
> 
> dat <- runif(50, 0, 10)
> myhist <- hist(dat, freq = TRUE, breaks ="Sturges")
> 
> plot(myhist, col = "darkgray",
>       labels = as.character(round(myhist$density*100,1) ),
>       ylim = c(0, 1.1*max(myhist$counts)))
> ## note that this is plot.histogram because myhist has class "histogram"
> 
> Note that I expanded the y axis a bit to be sure to include the labels. You
> can, of course, plot your separate years as Rui has indicated or via e.g.
> ?layout.
> 
> Apologies if I have misunderstood. Just ignore this in that case.
> Otherwise, I leave it to you to fill in details.
> 
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> 
> On Mon, Aug 16, 2021 at 4:14 PM Paul Bernal <paulbernal07 using gmail.com> wrote:
> 
>> Dear Jim,
>>
>> Thank you so much for your kind reply. Yes, this is what I am looking for,
>> however, can´t see clearly how the bars correspond to the bins in the
>> x-axis. Maybe there is a way to align the amounts so that they match the
>> columns, sorry if I sound picky, but just want to learn if there is a way
>> to accomplish this.
>>
>> Best regards,
>>
>> Paul
>>
>> El lun, 16 ago 2021 a las 17:57, Jim Lemon (<drjimlemon using gmail.com>)
>> escribió:
>>
>>> Hi Paul,
>>> I just worked out your first request:
>>>
>>> datasetregs<-<-structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
>>> 2L,
>>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>>> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>>> 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
>>> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
>>> 3L, 3L, 3L), .Label = c("AF 2017", "AF 2020", "AF 2021"), class =
>>> "factor"),
>>>      Amount = c(40100, 101100, 35000, 40100, 15000, 45100, 40200,
>>>      15000, 35000, 35100, 20300, 40100, 15000, 67100, 17100, 15000,
>>>      15000, 50100, 35100, 15000, 15000, 15000, 15000, 15000, 15000,
>>>      15000, 15000, 15000, 15000, 15000, 15000, 15000, 15000, 15000,
>>>      15000, 15000, 20100, 15000, 15000, 15000, 15000, 15000, 15000,
>>>      16600, 15000, 15000, 15700, 15000, 15000, 15000, 15000, 15000,
>>>      15000, 15000, 15000, 15000, 20200, 21400, 25100, 15000, 15000,
>>>      15000, 15000, 15000, 15000, 25600, 15000, 15000, 15000, 15000,
>>>      15000, 15000, 15000, 15000)), row.names = c(NA, -74L), class =
>>> "data.frame")
>>> histval<-with(datasetregs, hist(Amount, groups=Date, scale="frequency",
>>>   breaks="Sturges", col="darkgray"))
>>> library(plotrix)
>>> histpcts<-paste0(round(100*histval$counts/sum(histval$counts),1),"%")
>>> barlabels(histval$mids,histval$counts,histpcts)
>>>
>>> I think that's what you asked for:
>>>
>>> Jim
>>>
>>> On Tue, Aug 17, 2021 at 8:44 AM Paul Bernal <paulbernal07 using gmail.com>
>>> wrote:
>>>>
>>>> This is way better, now, how could I put the frequency labels in the
>>>> columns as a percentage, instead of presenting them as counts?
>>>>
>>>> Thank you so much.
>>>>
>>>> Paul
>>>>
>>>> El lun, 16 ago 2021 a las 17:33, Rui Barradas (<ruipbarradas using sapo.pt>)
>>>> escribió:
>>>>
>>>>> Hello,
>>>>>
>>>>> You forgot to cc the list.
>>>>>
>>>>> Here are two ways, both of them apply hist() and text() to Amount
>> split
>>>>> by Date. The return value of hist is saved because it's a list with
>>>>> members the histogram's bars midpoints and the counts. Those are used
>>> to
>>>>> know where to put the text labels.
>>>>> A vector lbls is created to get rid of counts of zero.
>>>>>
>>>>> The main difference between the two ways is the histogram's titles.
>>>>>
>>>>>
>>>>> old_par <- par(mfrow = c(1, 3))
>>>>> h_list <- with(datasetregs, tapply(Amount, Date, function(x){
>>>>>     h <- hist(x)
>>>>>     lbls <- ifelse(h$counts == 0, NA_integer_, h$counts)
>>>>>     text(h$mids, h$counts/2, labels = lbls)
>>>>> }))
>>>>> par(old_par)
>>>>>
>>>>>
>>>>>
>>>>> old_par <- par(mfrow = c(1, 3))
>>>>> sp <- split(datasetregs, datasetregs$Date)
>>>>> h_list <- lapply(seq_along(sp), function(i){
>>>>>     hist_title <- paste("Histogram of", names(sp)[i])
>>>>>     h <- hist(sp[[i]]$Amount, main = hist_title)
>>>>>     lbls <- ifelse(h$counts == 0, NA_integer_, h$counts)
>>>>>     text(h$mids, h$counts/2, labels = lbls)
>>>>> })
>>>>> par(old_par)
>>>>>
>>>>>
>>>>> Hope this helps,
>>>>>
>>>>> Rui Barradas
>>>>>
>>>>> Às 23:16 de 16/08/21, Paul Bernal escreveu:
>>>>>> Dear Rui,
>>>>>>
>>>>>> The hist() function comes from the graphics package, from what I
>>> could
>>>>>> see. The thing is that I want to divide the Amount column into
>>> several
>>>>>> bins and then generate three different histograms, one for each AF
>>>>>> period (AF refers to fiscal years). As you can see, the data
>> contains
>>>>>> three fiscal years (2017, 2020 and 2021). I want to see the
>>> percentage
>>>>>> of cases that fall into different amount categories, from 15,000
>> and
>>>>>> below, 16,000 to 17,000, from 18,000 to 19,000, and so on.
>>>>>>
>>>>>> Thanks for your kind help.
>>>>>>
>>>>>> Paul
>>>>>>
>>>>>> El lun, 16 ago 2021 a las 17:07, Rui Barradas (<
>> ruipbarradas using sapo.pt
>>>>>> <mailto:ruipbarradas using sapo.pt>>) escribió:
>>>>>>
>>>>>>      Hello,
>>>>>>
>>>>>>      The function Hist comes from what package?
>>>>>>
>>>>>>      Are you sure you don't want a bar plot?
>>>>>>
>>>>>>
>>>>>>      agg <- aggregate(Amount ~ Date, datasetregs, sum)
>>>>>>      bp <- barplot(Amount ~ Date, agg)
>>>>>>      with(agg, text(bp, Amount/2, labels = Amount))
>>>>>>
>>>>>>
>>>>>>      Hope this helps,
>>>>>>
>>>>>>      Rui Barradas
>>>>>>
>>>>>>      Às 22:54 de 16/08/21, Paul Bernal escreveu:
>>>>>>       > Hello everyone,
>>>>>>       >
>>>>>>       > I am currently working with R version 4.1.0 and I am trying
>> to
>>>>>>      include
>>>>>>       > (inside the columns of the histogram), the percentage
>>>>>>      distribution and I
>>>>>>       > want to generate three histograms, one for each fiscal year
>>> (in
>>>>>>      the Date
>>>>>>       > column, there are three fiscal year AF 2017, AF 2020 and AF
>>>>>>      2021). However,
>>>>>>       > I can´t seem to accomplish this.
>>>>>>       >
>>>>>>       > Here is my data:
>>>>>>       >
>>>>>>       > structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
>> 2L,
>>>>>>       > 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>>> 2L,
>>>>>>       > 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>>> 2L,
>>>>>>       > 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
>>> 3L,
>>>>>>       > 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
>>> 3L,
>>>>>>       > 3L, 3L, 3L), .Label = c("AF 2017", "AF 2020", "AF 2021"),
>>> class =
>>>>>>       > "factor"),
>>>>>>       >      Amount = c(40100, 101100, 35000, 40100, 15000, 45100,
>>> 40200,
>>>>>>       >      15000, 35000, 35100, 20300, 40100, 15000, 67100, 17100,
>>>>> 15000,
>>>>>>       >      15000, 50100, 35100, 15000, 15000, 15000, 15000, 15000,
>>>>> 15000,
>>>>>>       >      15000, 15000, 15000, 15000, 15000, 15000, 15000, 15000,
>>>>> 15000,
>>>>>>       >      15000, 15000, 20100, 15000, 15000, 15000, 15000, 15000,
>>>>> 15000,
>>>>>>       >      16600, 15000, 15000, 15700, 15000, 15000, 15000, 15000,
>>>>> 15000,
>>>>>>       >      15000, 15000, 15000, 15000, 20200, 21400, 25100, 15000,
>>>>> 15000,
>>>>>>       >      15000, 15000, 15000, 15000, 25600, 15000, 15000, 15000,
>>>>> 15000,
>>>>>>       >      15000, 15000, 15000, 15000)), row.names = c(NA, -74L),
>>> class
>>>>> =
>>>>>>       > "data.frame")
>>>>>>       >
>>>>>>       > I would like to modify the following script:
>>>>>>       >
>>>>>>       >> with(datasetregs, Hist(Amount, groups=Date,
>>> scale="frequency",
>>>>>>       > +   breaks="Sturges", col="darkgray"))
>>>>>>       >
>>>>>>       > #The only thing missing here are the percentages
>>> corresponding to
>>>>>>      each bin
>>>>>>       > (I would like to see the percentages inside each column, or
>> on
>>>>>>      top outside
>>>>>>       > if possible)
>>>>>>       >
>>>>>>       > Any help will be greatly appreciated.
>>>>>>       >
>>>>>>       > Best regards,
>>>>>>       >
>>>>>>       > Paul.
>>>>>>       >
>>>>>>       >       [[alternative HTML version deleted]]
>>>>>>       >
>>>>>>       > ______________________________________________
>>>>>>       > R-help using r-project.org <mailto:R-help using r-project.org> mailing
>>> list
>>>>>>      -- To UNSUBSCRIBE and more, see
>>>>>>       > https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>      <https://stat.ethz.ch/mailman/listinfo/r-help>
>>>>>>       > PLEASE do read the posting guide
>>>>>>      http://www.R-project.org/posting-guide.html
>>>>>>      <http://www.R-project.org/posting-guide.html>
>>>>>>       > and provide commented, minimal, self-contained, reproducible
>>> code.
>>>>>>       >
>>>>>>
>>>>>
>>>>
>>>>          [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>          [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list