[R] ... distribution based on mean values and quantiles in R?

Guangchao Chen gcchenleidenuniv at gmail.com
Wed Jul 20 15:22:52 CEST 2016


Dear Martin,

Thank you very much for your detailed explanation! I will have a look at the
poweRlaw package and see if things can be sorted out!

Best,
Daniel

On 19 July 2016 at 12:15, Martin Maechler <maechler at stat.math.ethz.ch>
wrote:

> >>>>> Jim Lemon <drjimlemon at gmail.com>
> >>>>>     on Tue, 19 Jul 2016 16:20:49 +1000 writes:
>
>     > Hi Daniel,
>     > Judging by the numbers you mention, the distribution is either very
>     > skewed or not at all normal. If you look at this:
>
>     > plot(c(0,0.012,0.015,0.057,0.07),c(0,0.05,0.4,0.05,0),type="b")
>
>     > you will see the general shape of whatever distribution produced
> these
>     > summary statistics. Did the paper give any hints as to what the model
>     > distribution might be?
>
> Yes, that's the correct question:  At first, it's not about
> plotting, about *fitting* a distribution with the desired
> properties, and then you can easily visualize it.
>
> So, what were the data?  If they are 'amounts' of any kind, they
> are necessarily non-negative often always positive, and hence
> --- according to John W Tukey --- should be analyzed after
> taking logs (Tukey's "first aid transformation" for *any*
> amounts data).
>
> Taking logs, and analyzing means to consider a normal
> ("Gaussian") distribution for the log(<data>)  which is
> equivalent to fitting a  lognormal distribution -- R functions
> [dpqrr]lnorm()
> to the original data.  I'd strongly recommend doing that.
>
> And I did so, finding out, however that if indeed it is the
> *mean* and the 15% and 95% quantiles,  a log normal is not
> fitting.  Here is the reproducible R code .. with lots of
> comments :
>
>
> ##
> ## MM  strongly recommends to fit a  log-normal distribution .. however it
> does *not* fit
>
> ## The "data statistics"
> qlN <- c(q15 = 0.012, q85 = 0.057) # Quantiles original scale
> mlN <- 0.015
>
> (qn <- log(qlN))                  # Quantiles log scale [assumed normal
> (Gaussian) here]
> ## as the Gaussian is symmetri, the mid value of the two quantiles is the
> mean and median :
> (mu <- mean(qn))   # -3.644
> (medlN <- exp(mu)) # 0.02615
> ## an estimate for the median(.)  -- but  it is *larger* than the mean =
> 0.015 above !
> ## ===> Log-normal does *NOT* fit well :
>
> ## From the help page, we learn that
> ##        E[lN] = exp(mu + 1/2 sigma^2)    {and it is trivial that}
> ##   median[lN] = exp(mu)
> ## where here, our medLn is a (moment) estimate of median[lN]
>
> ## If the number were different, this would solve the problem :
> ## Consequently, a (moment / plugin) estimate for sigma is
> (e12sig <- mlN / medlN) ## ~= exp( 1/2 sigma^2)
> (sig2 <- 2*log(e12sig)) ## ~=  sigma^2  [--- is *NEGATIVE* (<==>
> 'est.median' > mean !)]
> (sig <- sqrt(sig2))     ## ~=  sigma    [here of course 'NaN' with a
> warning !]
>
>
> My conclusion would be that other distributions (than the
> log-normal; the normal  is out of question !!) have to be
> considered, if you want to be serious about the topic.
>
> Maybe the poweRlaw package (https://cloud.r-project.org/package=poweRlaw)
> may help you (it has 4 vignettes, the last being a nice JSS publication).
>
> The above is a "cute" non-standard problem in any case: to fit very skewed
> distributions, given two quantiles and the mean only, and the
> approach taken by the "poweRlawyers", namely to minimize the KS
> (Kolmogorov-Smirnoff) decrepancy seems a good start to me.
>
> Martin Maechler,
> ETH Zurich
>
>
>
>     > Jim
>
>
>     > On Tue, Jul 19, 2016 at 7:11 AM, gcchenleidenuniv
>     > <gcchenleidenuniv at gmail.com> wrote:
>     >> Hi all,
>     >>
>     >> I need to draw density curves based on some published data. But in
> the article only mean value (0.015 ) and quantiles (Q0.15=0.012 ,
> Q0.85=0.057) were given. So I was thinking if it is possible to plot
> density curves solely based on the mean value and quantiles. The dnorm(x,
> mean, sd, log) function needs the standard deviation which was not
> mentioned, so it can not be used in this situation.
>     >>
>     >> Many thanks!!
>     >> Daniel
>     >> [[alternative HTML version deleted]]
>     >>
>     >> ______________________________________________
>     >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>     >> https://stat.ethz.ch/mailman/listinfo/r-help
>     >> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>     >> and provide commented, minimal, self-contained, reproducible code.
>
>     > ______________________________________________
>     > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>     > https://stat.ethz.ch/mailman/listinfo/r-help
>     > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>     > and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list