[R] distribution of daily rainfall values in binned categories

Martin Maechler maechler at stat.math.ethz.ch
Wed Jun 28 10:39:58 CEST 2006


>>>>> "FJZ" == Francisco J Zagmutt <gerifalte28 at hotmail.com>
>>>>>     on Wed, 28 Jun 2006 03:51:31 +0000 writes:

    FJZ> Hi Etienne,
    FJZ> Somebody asked a somehow related question recently.  
    FJZ> http://tolstoy.newcastle.edu.au/R/help/06/06/29485.html

    FJZ> Take a look at cut? table? and barplot?
    FJZ> i.e.

      # Creates fake data from uniform(0,30)
      set.seed(1) ## <<- added by MM
      x=runif(50, 0,30)

      # Creates categories
      rain=cut(x,breaks=c( 0, 1,2.5,5, 10, 20, Inf))

      # Creates contingency table of categories
      tab=table(rain)

      # Plots frequencies of rainfall
      barplot(tab)


No, no, no!  Do not confuse histograms with bar plots!

-  barplot() is {one possibility} for visualizing discrete
   ("categorical", "factor") data,
-  hist() is for visualizing *continuous* data  (*)

As Jim Porzak replied, do use hist(): the example really is a matter
of visualization of a continuous distribution which should *not*
be done by a barplot.  Instead, e.g.,

  hist(x, breaks = c(0, 1,2.5,5, 10,20, max(pretty(max(x)))),
       freq = TRUE, col = "gray")

will give a graphic similar to the above --- BUT also 
warns you about the hidden deception (aka sillyness) of *both* graphics:
Namely, the above hist() call warns you with

>> Warning message:
>> the AREAS in the plot are wrong -- rather use freq=FALSE in: ....

and finally,

  hist(x, breaks = c(0, 1,2.5,5, 10,20, max(pretty(max(x)))), col="gray")

gives you a more honest graphic --- which -- for the runif()
example -- may finally lead to you to realize that using unequal
break may really not be such a good idea.
Note however that for the OP rainfall data, that may well be different
and if I look at rainfall data, I find I would rather view

   hist(log10( <rainfall> ))
or then
   plot(density( log10( <rainfall> ) ))

Martin Maechler, ETH Zurich

(*) From statistical point of view, histograms just density estimators, 
    and -- as known for a while -- have quite some drawbacks.
    Hence they should nowadays often be replaced by
        plot(density(.), ..)


    >> From: etienne <etiennesky at yahoo.com>
    >> To: r-help at stat.math.ethz.ch
    >> Subject: [R] distribution of daily rainfall values in binned categories
    >> Date: Tue, 27 Jun 2006 11:28:59 -0700 (PDT)
    >> 
    >> Hi,
    >> 
    >> I'm a newbie in using R and I would like to have a few
    >> clues as to how I could compute and plot a
    >> distribution of daily rainfall intensity in different
    >> categories.  I have daily values (mm/day) for several
    >> years and I need to show the frequency of 0-1, 1-2.5,
    >> 2.5-5, 5-10, 10-20, 20+ mm/day.  Can this be done
    >> easily?
    >> 
    >> Thanks,
    >> Etienne
    >> 
    >> ______________________________________________
    >> R-help at stat.math.ethz.ch mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-help
    >> PLEASE do read the posting guide! 
    >> http://www.R-project.org/posting-guide.html

    FJZ> ______________________________________________
    FJZ> R-help at stat.math.ethz.ch mailing list
    FJZ> https://stat.ethz.ch/mailman/listinfo/r-help
    FJZ> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



More information about the R-help mailing list