[R] Help with generating data from a 'not quite' Normal distriburtion

Prof Brian Ripley ripley at stats.ox.ac.uk
Thu Aug 12 14:08:03 CEST 2004


On Thu, 12 Aug 2004, Martin Maechler wrote:

> >>>>> "Vito" == Vito Ricci <vito_ricci at yahoo.com>
> >>>>>     on Thu, 12 Aug 2004 10:59:23 +0200 (CEST) writes:
> 
>     Vito> Hi, Also the Cauchy's distribution could be good:
> 
>     Vito> rcauchy(n, location = 0, scale = 1)
> 
> "also" is an exaggeration, after you already told him to use the
> t-distribution family:
> 
> Cauchy = t-Dist(*, df = 1) !
> 
> 
>     DCrabb> I would be very grateful for any help from members of
>     DCrabb> this list for what might be a simple problem...
> 
>     DCrabb> We are trying to simulate the behaviour of a clinical
>     DCrabb> measurement in a series of computer experiments. This
>     DCrabb> is simple enough to do in R if we assume the
>     DCrabb> measurements to be Gaussian, but their empirical
>     DCrabb> distribution has a much higher peak at the mean and
>     DCrabb> the distribution has much longer tails. (The
>     DCrabb> distribution is quite symmetrical) Can anyone suggest
>     DCrabb> any distributions I could fit to this data, and better
>     DCrabb> still how I can then generate random data from this
>     DCrabb> 'distribution' using R?
> 
> I'd first try with the t distribution, using  fitdistr() from
> package MASS, e.g.,
> 
>   > x <- rt(1000, df = 1.5)
>   > library(MASS)
>   > fx <- fitdistr(x, densfun = "t")
>   > fx
> 	  m             s            df     
>     -0.01396785    1.04338151    1.57749052 
>    ( 0.04426267) ( 0.04766543) ( 0.10809543)
>   > 
> 
> (so it *does* estimate location and scale in addition to the df).
> 
> If you read the help page
>   > ?fitdistr
> 
> you'll see in the example that estimating 'df' is said to be
> problematic.
> AFAIK it can be better to reparametrize, possibly using 1/df or
> log(df) as new parameter.
> {but then you can't use fitdistr() but rather mle() and the
>  log likelihood or optim() directly}.

It is the use of ML for the df that is *in theory* problematic, not the
optimization per se.  See the reference, p.110, for some of the 
literature.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list