[R] Restricted Simulation from GPD & Normal Distributions

Preetam Pal lordpreetam at gmail.com
Tue Dec 27 11:22:16 CET 2016


Hi R-users,
(fixing some typos in my previous mail)

I have data on one equity-related variable X, denoted by
x1,x2,x3,.......x1000 which has been ordered as x1<x2<....<x1000. I have
identified the upper and lower 5 percentiles, i.e. x50 and x950
respectively. Based on some analysis, I have inferred that three different
density functions fit the three parts of the data decently well,

   - f1 fits the data for all x<x50 ----- 50 observations
   - f2 fits the data well for all x50<x<x950 ------- 900 observations
   - f3 fits the data well for all x>x950------ 50 obsrvations

Idea is to simulate 50 new observations from f1 *restricted to (- infinity,
x50 ]*, 50 new observations from f3 *restricted to  ( x950, infinity )* and
900 new observations from f2 *restricted between (x50, x950 ]*. So total
number of observations in the simulated data = 1000 as before.

For the example I am working with, f1 and f3 are GPD ( Generalized Pareto
Distribution ) while f2 is Normal with some parameters.

I want to write a function which will take as inputs

   - the entire data (of size 1000)
   - the cut-off points x50 and x950
   - the 3 distributions (along with their parameters)
   - the number of data points from each of the 3 segments (50, 900, 50 in
   this example)
   - note that f1, f2 and f3 need to be properly restricted to the
   corresponding intervals (mentioned in Bold in the description above)

and will output the simulated data with original sample size (here, 1000).

I'll really appreciate any help writing this function. If anything else is
required, please let me know.

On Tue, Dec 27, 2016 at 3:46 PM, Preetam Pal <lordpreetam at gmail.com> wrote:

> HI R-users,
>
> I have data on one equity-related variable X, denoted by
> x1,x2,x3,.......x1000 which has been ordered as x1<x2<....<x1000. I have
> identified the upper and lower 5 percentiles, i.e. x50 and x950
> respectively. Based on some analysis, I have inferred that three different
> density functions fit the three parts of the data decently well,
>
>    - fi fits the data for all x<x50 ----- 50 observations
>    - f2 fits the data well for all x50<x<x950 ------- 900 observations
>    - f3 fits the data well for all x>x950------ 50 obsrvations
>
> Idea is to simulate 50 new observations from f1 *restricted to (-
> infinity, x50 ]*, 50 new observations from f3 *restricted to ( x950,
> infinity )* and 900 new observations from f2 *restricted between (x50,
> x950 ]*. So total number of observations in the simulated data = 1000 as
> before.
>
> For the example I am working with, f1 and f2 are GPD ( Generalized Pareto
> Distribution ) while f2 is Normal with some parameters.
>
> I want to write a function which will take as inputs
>
>    - the entire data (of size 1000)
>    - the cut-off points x50 and x950
>    - the 3 distributions (along with their parameters)
>    - the number of data points from each of the 3 segments (50, 900, 50
>    in this example)
>    - note that f1, f2 and f3 need to be properly restricted to the
>    corresponding intervals (mentioned in Bold in the description above)
>
> and will output the simulated data with original sample size (here, 1000).
>
> I'll really appreciate any help writing this function. If anything else is
> required, please let me know.
>
> --
> Preetam Pal
> (+91)-9432212774
> M-Stat 2nd Year,                                             Room No. N-114
> Statistics Division,                                           C.V.Raman
> Hall
> Indian Statistical Institute,                                 B.H.O.S.
> Kolkata.
>



-- 
Preetam Pal
(+91)-9432212774
M-Stat 2nd Year,                                             Room No. N-114
Statistics Division,                                           C.V.Raman
Hall
Indian Statistical Institute,                                 B.H.O.S.
Kolkata.

	[[alternative HTML version deleted]]



More information about the R-help mailing list