[R] options in 'rnorm' to set the lower bound of normal dist

(Ted Harding) Ted.Harding at manchester.ac.uk
Thu Mar 27 15:00:48 CET 2008


Brian Ripley's suggestions of truncated normal and
log-normal are of course resources for ensuring that
you get positive simulated results.

However, I think youre real problem (having looked at
the numbers you quote) is that you should not be thinking
of using a normal distribution, or anything similar, at all.

Your variables dat$x and dat$y have:
   mean(dat$x)
## [1] 0.3126667
   sd(dat$x)
## [1] 0.3372137

   mean(dat$y)
## [1] 0.4223137
   sd(dat$y)
## [1] 0.5120841

so in both cases the SD is about the same as the mean, and
getting negative values from simulated normal distributions
with these means and SDs is inevitable.

But now look at the two series of 15 numbers in dat$x and dat$y:
The first 12 are of the order of 0.15 and 0.20 respectively,
while the final 3 in each case are of the order of 1.0 and 1.5
respectively. And this is where your large SD is coming from.
Neither series is from a simple normal distribution.

   mean(dat$x[1:12])
## [1] 0.1519167
   sd(dat$x[1:12])
## [1] 0.02447432
and it is impossible for the last 3 ( 1.032 0.803 1.032)
to have come from a normal distribution giving the first 12.

   mean(dat$y[1:12])
## [1] 0.1807932
   sd(dat$y[1:12])
## [1] 0.03380083
with a similar conclusion; and likewise for the last three
1.551043 1.063100 1.551043

Note that, for the first 12 in each case, the SD less that
1/5 of the mean:

   mean(dat$x[1:12])/sd(dat$x[1:12])
## [1] 6.207186
   mean(dat$y[1:12])/sd(dat$y[1:12])
## [1] 5.348779

so, where the first 12 of x and y are concerned, if you sampled
from normal distributions with the same means and SDs you would
get a negative number with probability less than

   pnorm(-5)
## [1] 2.866516e-07

What you in fact have here is that the numbers are in two groups,
each with a small SD relative to its mean:

             dat$x             dat$y
          Mean     SD       Mean     SD
------------------------+-------------------
 1:12    0.152   0.024  |  0.181   0.034
                        |
13:15    0.956   0.132  |  1.39    0.282

Note that for dat$x Mean/SD approx = 6 for each sub-series,
and for data$y Mean/SD approx = 5 for each subseries, so
you could be looking at results which display a nearly
constant coefficient of variation. Now, this is indeed a
property of the log-normal distribution (as well as of
others), so that could indeed be worth considering.
However, you still have to account for the apparent split
noted above into distinct groups.

So you are really facing a modelling question: why did
the numbers come out as they did, and what is a good way
to represent that mechanism as a distribution?

With best wishes,
Ted.

On 27-Mar-08 12:27:55, Tom Cohen wrote:
> 
> Dear list, 
>   I have a dataset containing values obtained from two different
> instruments (x and y).
> I want to generate 5 samples from normal distribution for each
> instrument based on 
> their means and standard deviations. The problem is values from both
> instruments are 
> non-negative, so if using rnorm I would get some negative values. Is
> there any options
> to determine the lower bound of normal distribution to be 0 or can I
> simulate the 
> samples in different ways to avoid the negative values? 
>    
>    
>   > dat
>     id     x         y
> 75 101 0.134 0.1911315
> 79 102 0.170 0.1610306
> 76 103 0.134 0.1911315
> 84 104 0.170 0.1610306
> 74 105 0.134 0.1911315
> 80 106 0.170 0.1610306
> 77 107 0.134 0.1911315
> 81 108 0.170 0.1610306
> 82 109 0.170 0.1610306
> 78 111 0.170 0.1610306
> 83 112 0.170 0.1610306
> 85 113 0.097 0.2777778
> 2  201 1.032 1.5510434
> 1  202 0.803 1.0631001
> 5  203 1.032 1.5510434
>   
> mu<-apply(dat[,-1],2,mean)
> sigma<-apply(dat[,-1],2,sd)
>   len<-5
> n<-20
> s1<-vector("list",len)
>   set.seed(7)
> for(i in 1:len){
>     s1[[i]]<-cbind.data.frame(x=rnorm(n*i,mean=mu[1],sd=sigma[1]),
>                              y=rnorm(n*i,mean=mu[2],sd=sigma[2]))
>                }
>      
> Thanks for any help,
> Tom
> 
>        
> ---------------------------------
> Sök efter kärleken! 
> 
>       [[alternative HTML version deleted]]
> 

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 27-Mar-08                                       Time: 14:00:44
------------------------------ XFMail ------------------------------



More information about the R-help mailing list