[R] order statistic of multivariate normal

Fri Mar 22 23:31:10 CET 2013

On Mar 22, 2013, at 8:20 AM, Ranjan Maitra wrote:

> I don't believe that you necessarily need to use simulation for this.
> But you do need numerical integration. Here is a skeletal approach.
> 
> Calculate the density (distribution) of the order statistics of a
> multivariate sample.

That does not make a lot of sense to me. How were you proposing to do this calculation?

> Then since the underlying distribution is
> multivariate normal, use a multivariate integration routine in R (try
> the mnormt package) to get the integration part of the calculation and
> proceed.
> 
> As I said before, here is the outline of an approach I would first take.
> You get to work through the details:-)

An "order statistic" implies that a sample of some particular size has been drawn and sorted a countable sequence, from some distribution, possibly theoretic or empiric.  And to me anyway, it implies that these be IID draws, possibly from a mixture distribution but still IID draws from a common distribution. I think the ordering  is requested to be the multivariate probability integral from c(-Inf, -Inf), although other specifications could be imagined. This is actually a bit ambiguous since the set of coordinates for which Pr((x,y)| mvN( c(0,0), sigma) ) = a will be a curve,  or in teh dim > 2 case a surface of dim one less than the dimension of the sample space. In the unlikely event of ties, these could be resolved by further ordering by sufficient point coordinates to resolve. This is an example in R^2.

library(mvtnorm)
distmv4.10 <- replicate(1000, 
  {mat <- rmvnorm(10, mean=c(0,0), sigma=diag(2)) # 10 IID draws from predetermined distribution
   mat <- cbind(mat, apply(mat, 1, function( x) 
                 pmvnorm( lower=c(-Inf, -Inf), upper=x, mean=c(0,0), sigma=diag(2)) ))
   mat[order(mat[,3]), ][4,1:2]}   # pick the fourth "order statistic",  an x,y pair.
                    )
plot(t(distmv4.10), cex=0.3)
title("Location of the 4th order statistic from replicated\n 10 IID draws from MVN( c(0,0, sigma= diag(2) )")

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Rep1000.mvn4.from.10.png
Type: image/png
Size: 121666 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130322/040f4015/attachment.png>
-------------- next part --------------

The theoretical curve that I imagine to be somewhat 'hyperboloid' for which Pr((x,y)|mvN(c(0,0), sigma) ) = 3/9 should run through this cloud of points (3 points will be below and 6 points above). That is what I imagined to be meant by "Calculate the density (distribution) of the order statistics", but it seemed not particularly amenable to calculation. It's pretty clear after plotting that my naive intuition that the asymptotes should lie at x=qnorm(3/9) and y=qnorm(3/9) is ill-founded. So if someone wants to educate me or dispute any of this, I'm susceptible to further argument.

-- 
David.

> Ranjan 
> 
> On Fri, 22 Mar 2013 10:48:06 -0400 li li <hannah.hlx at gmail.com> wrote:
> 
>> Yes. What I meant is "the distribution of order statistics from a
>> non-iid sample of a (normal) distribution with specified sample
>> covariance matrix".
>> Thanks for the idea of simulation.  I guess there is no other way
>> around.
>>     Hanna
>> 
>> 2013/3/22 Bert Gunter <gunter.berton at gene.com>
>> 
>>> As you suggest, Ted, it appears from the question that the OP really means
>>> "order statistics of a sample of 10 from the distribution."  So what she
>>> appears to want is the distribution of order statistics from a non-iid
>>> sample of a (normal) distribution with specified sample covariance matrix.
>>> 
>>> The independent case is standard first statistics course stuff, but I
>>> believe this would require a 10-d integral (please correct if wrong!) for
>>> non-iid.  So it would seem that simulation would be the simplest approach,
>>> and, indeed, should be straightforward. E.g. the mvrnorm() function in MASS
>>> could be used to simulate the samples.
>>> 
>>> Again, corrections appreciated if I am wrong on any of this.
>>> 
>>> -- Bert
>>> 
>>> 
>>> 
>>> On Fri, Mar 22, 2013 at 6:31 AM, Ted Harding <Ted.Harding at wlandres.net>wrote:
>>> 
>>>> On 22-Mar-2013 13:02:25 li li wrote:
>>>>> Thank you all for the reply.
>>>>> 
>>>>> One example of my question is as follows.
>>>>> 
>>>>> Suppose X1, ..., X10 has multivariate normal distribution
>>>>> and X(1), ..., X(10) are the corresponding order statistics.
>>>>> 
>>>>> My question is that whether there is a R function that would
>>>>> help compute the c which satisfies
>>>>> P(X(4) <c)=beta.
>>>>> Here beta is a known constant between 0 and 1.
>>>>> 
>>>>> Thank you.
>>>>>  Hanna
>>>> 
>>>> The basic question which needs to be answered (which has been hinted
>>>> at in earlier replis) is: How do you define "order statistic" for
>>>> multivariate observations?
>>>> 
>>>> For example, here is a sample of 10 (to 3 d.p.) from a bivariate
>>>> normal distribution:
>>>> 
>>>>         [,1]   [,2]
>>>>  [1,]  1.143 -0.396
>>>>  [2,] -0.359 -0.217
>>>>  [3,] -0.391 -0.601
>>>>  [4,] -0.416 -1.093
>>>>  [5,] -1.810 -1.499
>>>>  [6,] -0.367 -0.636
>>>>  [7,] -2.238  0.563
>>>>  [8,]  0.811  1.230
>>>>  [9,]  0.082  0.174
>>>> [10,] -1.359 -0.364
>>>> 
>>>> Which one of these 10 rows is X(4)?
>>>> 
>>>> There is an alternative interpretation of your question:
>>>> 
>>>> "Suppose X1, ..., X10 has multivariate normal distribution
>>>> and X(1), ..., X(10) are the corresponding order statistics."
>>>> 
>>>> This could mean that the vector (X1,...,X10) has a multivariate
>>>> normal distribution with 10 dimensions, and, for a single vector
>>>> (X1,...,X10) drawn from this distribution, (X(1), ..., X(10))
>>>> is a vector consisting of these same values (X1,...,X10), but
>>>> in increasing order.
>>>> 
>>>> Is that what you mean?
>>>> 
>>>> Hoping this helps,
>>>> Ted.
>>>> 
>>> 
>>> --
>>> 
>>> Bert Gunter
>>> Genentech Nonclinical Biostatistics
> 

David Winsemius
Alameda, CA, USA