[R] Statistical distribution not fitting

Boris Steipe boris.steipe at utoronto.ca
Wed Jul 22 22:50:31 CEST 2015


So - as you can see, your data can be modelled.

Now the interesting question is: what do you do with that knowledge. I know nearly nothing about your domain, but given that the data looks log-normal, I am curious abut the following:

 - Most of the events are in the small-loss category. But most of the damage is done by the rare large losses. Is it even meaningful to guard against a single 1/1000 event? Shouldn't you be saying: my contingency funds need to be large enough to allow survival of, say, a fiscal year with 99.9 % probability? This is a very different question.

 - If a loss occurs, in what time do the funds need to be replenished? Do you need to take series of events into account?

 - The model assumes that the data are independent. This is probably a poor (and dangerous) assumption.

Cheers,
B.





On Jul 22, 2015, at 3:56 PM, Ben Bolker <bbolker at gmail.com> wrote:

> Amelia Marsh <amelia_marsh08 <at> yahoo.com> writes:
> 
> 
>> Hello!  (I dont know if I can raise this query here on this forum,
>> but I had already raised on teh finance forum, but have not received
>> any sugegstion, so now raising on this list. Sorry for the same. The
>> query is about what to do, if no statistical distribution is fitting
>> to data).
> 
>> I am into risk management and deal with Operatioanl risk. As a part
>> of BASEL II guidelines, we need to arrive at the capital charge the
>> banks must set aside to counter any operational risk, if it
>> happens. As a part of Loss Distribution Approach (LDA), we need to
>> collate past loss events and use these loss amounts. The usual
>> process as being practised in the industry is as follows -
> 
>> Using these historical loss amounts and using the various
>> statistical tests like KS test, AD test, PP plot, QQ plot etc, we
>> try to identify best statistical (continuous) distribution fitting
>> this historical loss data. Then using these estimated parameters
>> w.r.t. the statistical distribution, we simulate say 1 miliion loss
>> anounts and then taking appropriate percentile (say 99.9%), we
>> arrive at the capital charge.
> 
>> However, many a times, loss data is such that fitting of
>> distribution to loss data is not possible. May be loss data is
>> multimodal or has significant variability, making the fitting of
>> distribution impossible.  Can someone guide me how to deal with such
>> data and what can be done to simulate losses using this historical
>> loss data in R.
> 
> A skew-(log)-normal fit doesn't look too bad ... (whenever you
> have positive data that are this strongly skewed, log-transforming
> is a good step)
> 
> hist(log10(mydat),col="gray",breaks="FD",freq=FALSE)
> ## default breaks are much coarser:
> ## hist(log10(mydat),col="gray",breaks="Sturges",freq=FALSE)
> lines(density(log10(mydat)),col=2,lwd=2)
> library(fGarch)
> ss <- snormFit(log10(mydat))
> xvec <- seq(2,6.5,length=101)
> lines(xvec,do.call(dsnorm,c(list(x=xvec),as.list(ss$par))),
>      col="blue",lwd=2)
> ## or try a skew-Student-t: not very different:
> ss2 <- sstdFit(log10(mydat))
> lines(xvec,do.call(dsstd,c(list(x=xvec),as.list(ss2$estimate))),
>      col="purple",lwd=2)
> 
> There are more flexible distributional families (Johnson,
> log-spline ...)
> 
> Multimodal data are a different can of worms -- consider
> fitting a finite mixture model ...
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list