[R] ... distribution based on mean values and quantiles in R?

Martin Maechler maechler at stat.math.ethz.ch
Tue Jul 19 12:15:38 CEST 2016


>>>>> Jim Lemon <drjimlemon at gmail.com>
>>>>>     on Tue, 19 Jul 2016 16:20:49 +1000 writes:

    > Hi Daniel,
    > Judging by the numbers you mention, the distribution is either very
    > skewed or not at all normal. If you look at this:

    > plot(c(0,0.012,0.015,0.057,0.07),c(0,0.05,0.4,0.05,0),type="b")

    > you will see the general shape of whatever distribution produced these
    > summary statistics. Did the paper give any hints as to what the model
    > distribution might be?

Yes, that's the correct question:  At first, it's not about
plotting, about *fitting* a distribution with the desired
properties, and then you can easily visualize it.

So, what were the data?  If they are 'amounts' of any kind, they
are necessarily non-negative often always positive, and hence
--- according to John W Tukey --- should be analyzed after
taking logs (Tukey's "first aid transformation" for *any*
amounts data).

Taking logs, and analyzing means to consider a normal
("Gaussian") distribution for the log(<data>)  which is
equivalent to fitting a  lognormal distribution -- R functions [dpqrr]lnorm() 
to the original data.  I'd strongly recommend doing that.

And I did so, finding out, however that if indeed it is the
*mean* and the 15% and 95% quantiles,  a log normal is not
fitting.  Here is the reproducible R code .. with lots of
comments :


##
## MM  strongly recommends to fit a  log-normal distribution .. however it does *not* fit

## The "data statistics"
qlN <- c(q15 = 0.012, q85 = 0.057) # Quantiles original scale
mlN <- 0.015

(qn <- log(qlN))                  # Quantiles log scale [assumed normal (Gaussian) here]
## as the Gaussian is symmetri, the mid value of the two quantiles is the mean and median :
(mu <- mean(qn))   # -3.644
(medlN <- exp(mu)) # 0.02615
## an estimate for the median(.)  -- but  it is *larger* than the mean = 0.015 above !
## ===> Log-normal does *NOT* fit well :

## From the help page, we learn that
##        E[lN] = exp(mu + 1/2 sigma^2)    {and it is trivial that}
##   median[lN] = exp(mu)
## where here, our medLn is a (moment) estimate of median[lN]

## If the number were different, this would solve the problem :
## Consequently, a (moment / plugin) estimate for sigma is
(e12sig <- mlN / medlN) ## ~= exp( 1/2 sigma^2)
(sig2 <- 2*log(e12sig)) ## ~=  sigma^2  [--- is *NEGATIVE* (<==> 'est.median' > mean !)]
(sig <- sqrt(sig2))     ## ~=  sigma    [here of course 'NaN' with a warning !]


My conclusion would be that other distributions (than the
log-normal; the normal  is out of question !!) have to be
considered, if you want to be serious about the topic.

Maybe the poweRlaw package (https://cloud.r-project.org/package=poweRlaw)
may help you (it has 4 vignettes, the last being a nice JSS publication).

The above is a "cute" non-standard problem in any case: to fit very skewed
distributions, given two quantiles and the mean only, and the
approach taken by the "poweRlawyers", namely to minimize the KS
(Kolmogorov-Smirnoff) decrepancy seems a good start to me.

Martin Maechler,
ETH Zurich



    > Jim


    > On Tue, Jul 19, 2016 at 7:11 AM, gcchenleidenuniv
    > <gcchenleidenuniv at gmail.com> wrote:
    >> Hi all,
    >> 
    >> I need to draw density curves based on some published data. But in the article only mean value (0.015 ) and quantiles (Q0.15=0.012 , Q0.85=0.057) were given. So I was thinking if it is possible to plot density curves solely based on the mean value and quantiles. The dnorm(x, mean, sd, log) function needs the standard deviation which was not mentioned, so it can not be used in this situation.
    >> 
    >> Many thanks!!
    >> Daniel
    >> [[alternative HTML version deleted]]
    >> 
    >> ______________________________________________
    >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    >> https://stat.ethz.ch/mailman/listinfo/r-help
    >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    >> and provide commented, minimal, self-contained, reproducible code.

    > ______________________________________________
    > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    > https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    > and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list