[R] Generating samples from truncated multivariate Student-t distribution

Czarek Kowalski czarek230800 at gmail.com
Wed May 10 20:02:18 CEST 2017


Previously I had used another language to make calculations based on
theory. I have calculated using R and I have received another results.
My theoretical calculation does not take into account the full
covariance matrix (only 6 elements from diagonal). Code based on
theory:

df = 4;   #degrees of freedom
sigmas = c(1, 4, 2, 5, 3, 6) # roots of diagonal elements of covariance matrix
meann = c(55, 40, 50, 35, 45, 30)
alfa1 = 20; # lower truncation
beta1 = 60; # upper truncation
a = (alfa1 - meann)/sigmas;
b = (beta1 - meann)/sigmas;
E = meann + sigmas * ((gamma(df - 1)/2)*((df + a^2)^(-(df-1)/2) - (df
+ b^2)^(-(df-1)/2))*df^(df/2))/(2*(pt(b,df)-pt(a,df))*gamma(df/2)*gamma(1/2))
E


Kind regards
Czarek

On 9 May 2017 at 23:50, David Winsemius <dwinsemius at comcast.net> wrote:
>
>> On May 9, 2017, at 2:33 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>>
>>
>>> On May 9, 2017, at 2:05 PM, Czarek Kowalski <czarek230800 at gmail.com> wrote:
>>>
>>> I have already posted that in attachement - pdf file.
>>
>> I see that now. I failed to scroll to the 3rd page.
>>
>>> I am posting
>>> plain text here:
>>>
>>>> library(tmvtnorm)
>>>> meann = c(55, 40, 50, 35, 45, 30)
>>>> covv = matrix(c(  1, 1, 0, 2, -1, -1,
>>>                   1, 16, -6, -6, -2, 12,
>>>                   0, -6, 4, 2, -2, -5,
>>>                   2, -6, 2, 25, 0, -17,
>>>                  -1, -2, -2, 0, 9, -5,
>>>                  -1, 12, -5, -17, -5, 36), 6, 6)
>>> df = 4
>>> lower = c(20, 20, 20, 20, 20, 20)
>>> upper = c(60, 60, 60, 60, 60, 60)
>>> X1 <- rtmvt(n=100000, meann, covv, df, lower, upper)
>>>
>>>
>>>> sum(X1[,1]) / 100000
>>> [1] 54.98258
>>> sum(X1[,2]) / 100000
>>> [1] 40.36153
>>> sum(X1[,3]) / 100000
>>> [1] 49.83571
>>> sum(X1[,4]) / 100000
>>> [1] 34.69571      # "4th element of mean vector"
>>> sum(X1[,5]) / 100000
>>> [1] 44.81081
>>> sum(X1[,6]) / 100000
>>> [1] 31.10834
>>>
>>> And corresponding results received using equation (3) from pdf file:
>>> [54.97,
>>> 40,
>>> 49.95,
>>> 35.31, #  "4th element of mean vector"
>>> 44.94,
>>> 31.32]
>>>
>>
>> I get similar results for the output from your code,
>>
>> My 100-fold run of your calculations were:
>>
>> meansBig <- replicate(100, {Xbig <- rtmvt(n=100000, meann, covv, df, lower, upper)
>> colMeans(Xbig)} )
>>
>> describe(meansBig[4,])  # describe is from Hmisc package
>>
>> meansBig[4, ]
>>       n  missing distinct     Info     Mean      Gmd      .05      .10      .25
>>     100        0      100        1     34.7  0.01954    34.68    34.68    34.69
>>     .50      .75      .90      .95
>>   34.70    34.72    34.72    34.73
>>
>> lowest : 34.65222 34.66675 34.66703 34.66875 34.67566
>> highest: 34.72939 34.73012 34.73051 34.73742 34.74441
>>
>>
>> So agree, 35.31 is outside the plausible range of an RV formed with that package, but I don't have any code relating to your calculations from theory.
>
> Further investigation:
>
> covDiag <- covv*( row(covv)==col(covv) )  # just the diagonal means
>
> Repeat with all zero covariances:
>
>> meansDiag <- replicate(100, {Xbig <- rtmvt(n=100000, meann, covDiag, df, lower, upper)
> + colMeans(Xbig)} )
>> describe(meansDiag[4,])
> meansDiag[4, ]
>        n  missing distinct     Info     Mean      Gmd      .05      .10      .25
>      100        0      100        1    35.23  0.02074    35.21    35.21    35.22
>      .50      .75      .90      .95
>    35.23    35.25    35.26    35.26
>
> lowest : 35.18360 35.19756 35.20098 35.20179 35.20622
> highest: 35.26367 35.26635 35.26791 35.27251 35.27302
>
> So failing to account for the covariances in your theoretical calculations mostly explains the apparent discrepancy, although your value of 35.31 would be at the  far end of a statistical distribution and I wonder about some sort of error in your theoretical calculation, which didn't appear to take into account the covariance matrix.
>
> Best;
> David.
>
>
>
>>
>> Best;
>> David.
>>
>>
>>> On 9 May 2017 at 22:17, David Winsemius <dwinsemius at comcast.net> wrote:
>>>>
>>>>> On May 9, 2017, at 1:11 PM, Czarek Kowalski <czarek230800 at gmail.com> wrote:
>>>>>
>>>>> Of course I have expected the difference between theory and a sample
>>>>> of realizations of RV's and result mean should still be a random
>>>>> variable. But, for example for 4th element of mean vector: 35.31 -
>>>>> 34.69571 = 0.61429. It is quite big difference, nieprawdaż? I have
>>>>> expected that the difference would be smaller because of law of large
>>>>> numbers (for 10mln samples the difference is quite similar).
>>>>
>>>> I for one have no idea what is meant by a "4th element of mean vector". So I have now idea what to consider "big". I have found that my intuitions about multivariate distributions, especially those where the covariate structure is as complex as you have assumed, are often far from simulated results.
>>>>
>>>> I suggest you post some code and results.
>>>>
>>>> --
>>>> David.
>>>>
>>>>
>>>>>
>>>>> On 9 May 2017 at 21:40, David Winsemius <dwinsemius at comcast.net> wrote:
>>>>>>
>>>>>>> On May 9, 2017, at 10:09 AM, Czarek Kowalski <czarek230800 at gmail.com> wrote:
>>>>>>>
>>>>>>> Dear Members,
>>>>>>> I am working with 6-dimensional Student-t distribution with 4 degrees
>>>>>>> of freedom truncated to [20; 60]. I have generated 100 000 samples
>>>>>>> from truncated multivariate Student-t distribution using rtmvt
>>>>>>> function from package ‘tmvtnorm’. I have also calculated  mean vector
>>>>>>> using equation (3) from attached pdf. The problem is, that after
>>>>>>> summing all elements in one column of rtmvt result (and dividing by
>>>>>>> 100 000) I do not receive the same result as using (3) equation. Could
>>>>>>> You tell me, what is incorrect, why there is a difference?
>>>>>>
>>>>>> I guess the question is why you would NOT expect a difference between theory and a sample of realizations of RV's? The result mean should still be a random variable, night wahr?
>>>>>>
>>>>>>
>>>>>>> Yours faithfully
>>>>>>> Czarek Kowalski
>>>>>>> <truncatedT.pdf>______________________________________________
>>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>> David Winsemius
>>>>>> Alameda, CA, USA
>>>>>>
>>>>
>>>> David Winsemius
>>>> Alameda, CA, USA
>>>>
>>
>> David Winsemius
>> Alameda, CA, USA
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>



More information about the R-help mailing list