[R] A vector of normal distributed values with a sum-to-zero constraint

Frede Aakmann Tøgersen frtog at vestas.com
Wed Apr 2 09:01:02 CEST 2014


Hi Marc

I think that we could help you better if we knew in which context you need sample from a sum constrained normal distribution. However this is more a question on probability theory than on how to do it in R.

The proposal so far has been linear transformation of multivariate normal distribution (Marc, Rui), mixture of normal and reflected normal distribution (Boris, try that with e.g. mu = 2), normal distribution mixed with single point with positive mass (Jlucke), degenerated normal distribution (Greg).

What you in fact want to do is to draw samples from a conditional distribution. The condition is the sum constraint so if we have x = (x1, x2, ..., xn) then sum_{i=1}^n xi = 0 or x1 + x2 + ... x{n-1} = xn so you want to draw samples from P(x given that x is normal distributed and sum(x)=0). The sum constraint gives in fact what is called distributions on the simplex. Google for "normal distribution simplex" and you will get almost 2 mill hits. The second shows how to sample using Gibbs sampling (http://dobigeon.perso.enseeiht.fr/papers/Dobigeon_TechReport_2007b.pdf).

However you can probably just use other distributions given sum constraint since you say that you only need the sample as initial values for a MCMC algorithm. Many methods are available from "compositional statistics" (google for that, Aitchison 1986 is the pioneer). 

At least two packages are available for R:"compositions" with the latest version from 2013 and can be found in the archives and "robComposition" still maintained.

Hope that helps, it is help for yourself to find a solution.


Yours sincerely / Med venlig hilsen


Frede Aakmann Tøgersen
Specialist, M.Sc., Ph.D.
Plant Performance & Modeling

Technology & Service Solutions
T +45 9730 5135
M +45 2547 6050
frtog at vestas.com
http://www.vestas.com

Company reg. name: Vestas Wind Systems A/S
This e-mail is subject to our e-mail disclaimer statement.
Please refer to www.vestas.com/legal/notice
If you have received this e-mail in error please contact the sender. 


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf Of Marc Marí Dell'Olmo
> Sent: 1. april 2014 16:57
> To: Boris Steipe
> Cc: r-help at r-project.org
> Subject: Re: [R] A vector of normal distributed values with a sum-to-zero
> constraint
> 
> Boris is right. I need this vector to include as initial values of a
> MCMC process (with openbugs) and If I use this last approach sum(x)
> could be a large (or extreme) value and can cause problems.
> 
> The other approach x <- c(x, -x) has the problem that only vectors
> with even values are obtained.
> 
> Thank you!
> 
> 
> 2014-04-01 16:25 GMT+02:00 Boris Steipe <boris.steipe at utoronto.ca>:
> > But the result is not Normal. Consider:
> >
> > set.seed(112358)
> > N <- 100
> > x <- rnorm(N-1)
> > sum(x)
> >
> > [1] 1.759446   !!!
> >
> > i.e. you have an outlier at 1.7 sigma, and for larger N...
> >
> > set.seed(112358)
> > N <- 10000
> > x <- rnorm(N-1)
> > sum(x)
> > [1] -91.19731
> >
> > B.
> >
> >
> > On 2014-04-01, at 10:14 AM, JLucke at ria.buffalo.edu wrote:
> >
> >> The sum-to-zero constraint imposes a loss of one degree of freedom.  Of
> N samples, only (N-1) can be random.   Thus the solution is
> >> > N <- 100
> >> > x <- rnorm(N-1)
> >> > x <- c(x, -sum(x))
> >> > sum(x)
> >> [1] -7.199102e-17
> >>
> >> >
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> Boris Steipe <boris.steipe at utoronto.ca>
> >> Sent by: r-help-bounces at r-project.org
> >> 04/01/2014 09:29 AM
> >>
> >> To
> >> Marc Marí Dell'Olmo <marceivissa at gmail.com>,
> >> cc
> >> "r-help at r-project.org" <r-help at r-project.org>
> >> Subject
> >> Re: [R] A vector of normal distributed values with a sum-to-zero
> constraint
> >>
> >>
> >>
> >>
> >>
> >> Make a copy with opposite sign. This is Normal, symmetric, but no longer
> random.
> >>
> >>  set.seed(112358)
> >>  x <- rnorm(5000, 0, 0.5)
> >>  x <- c(x, -x)
> >>  sum(x)
> >>  hist(x)
> >>
> >> B.
> >>
> >> On 2014-04-01, at 8:56 AM, Marc Marí Dell'Olmo wrote:
> >>
> >> > Dear all,
> >> >
> >> > Anyone knows how to generate a vector of Normal distributed values
> >> > (for example N(0,0.5)), but with a sum-to-zero constraint??
> >> >
> >> > The sum would be exactly zero, without decimals.
> >> >
> >> > I made some attempts:
> >> >
> >> >> l <- 1000000
> >> >> aux <- rnorm(l,0,0.5)
> >> >> s <- sum(aux)/l
> >> >> aux2 <- aux-s
> >> >> sum(aux2)
> >> > [1] -0.000000000006131392
> >> >>
> >> >> aux[1]<- -sum(aux[2:l])
> >> >> sum(aux)
> >> > [1] -0.00000000000003530422
> >> >
> >> >
> >> > but the sum is not exactly zero and not all parameters are N(0,0.5)
> >> > distributed...
> >> >
> >> > Perhaps is obvious but I can't find the way to do it..
> >> >
> >> > Thank you very much!
> >> >
> >> > Marc
> >> >
> >> > ______________________________________________
> >> > R-help at r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list