[R] set the lower bound of normal distribution to 0 ?

Tue Apr 1 14:49:36 CEST 2008

Dear Tom,

In my opinion you should first transform your data to the log-scale and then calculate the mean and st.dev. of the log-transformed data. Because mean(log(x)) is not equal to log(mean(x)).

HTH,

Thierry

----------------------------------------------------------------------------
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium 
tel. + 32 54/436 185
Thierry.Onkelinx op inbo.be 
www.inbo.be 

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

-----Oorspronkelijk bericht-----
Van: r-help-bounces op r-project.org [mailto:r-help-bounces op r-project.org] Namens Tom Cohen
Verzonden: dinsdag 1 april 2008 14:17
Aan: r-help op stat.math.ethz.ch
Onderwerp: [R] set the lower bound of normal distribution to 0 ?

Tom Cohen <tom.cohen78 op yahoo.se> skrev:    Thanks Prof Brian for your suggestion. 
I should know that for right-skewed data,
one should generate the samples from a lognormal. 

My problem is that x and y are two instruments that were thought to 
be measured the same thing but somehow show a wide confidence interval
of the difference between the two intruments.This may be true that
these two measure differently but can also due to the small 
number of observations, so the idea is if I increases the sample size 
then I may get better precision between the two instrument by generating
samples based on the means and standard deviations
from x and y.

I am using 'urlnorm' which allows sampling from 
truncated distribution since I want the samples 
to take values from 0 to the max(x) respectively max(y). 
I am unsure how to specify the means and standard deviations
in 'urlnorm'. Based on x- and y-values I have standard deviations
sd_x=0.3372137, sd_y=0.5120841 and the means mean_x=0.3126667 
mean_y=0.4223137 which are not on log scale as required in urlnorm.

To covert sd_x, sd_y and mean_x, mean_y on a log-scale I did
sd_logx=sqrt(log(1.3372137))=0.54, sd_logy=sqrt(log(1.5120841))=0.64,
mean_logx=-(0.54^2)/2 and mean_logy=-(0.64^2)/2. Can anyone tell if these 
are correctly calculated? Are these the values to be specified in urlnorm?
Do the lower respectively upper bound have to be on the log-scale as well
or which scale?

   set.seed(7)
> for(i in 1:len){
> s1[[i]]<-cbind.data.frame(x=urlnorm(n*i,meanlog=mean_logx,sdlog=sd_logx, lb=0, ub=max(x)),
> y=urlnorm(n*i,meanlog=mean_logy,sdlog=sd_logy, lb=0, ub=max(y)))
> }

  Thanks again for any suggetions.

Prof Brian Ripley <ripley op stats.ox.ac.uk> skrev:
  On Thu, 27 Mar 2008, Tom Cohen wrote:

>
> Dear list,

> I have a dataset containing values obtained from two different 
> instruments (x and y). I want to generate 5 samples from normal 
> distribution for each instrument based on their means and standard 
> deviations. The problem is values from both instruments are 
> non-negative, so if using rnorm I would get some negative values. Is 
> there any options to determine the lower bound of normal distribution to 
> be 0 or can I simulate the samples in different ways to avoid the 
> negative values?

Well, that would not be a normal distribution.

If you want a _truncated_ normal distribution it is very easy by 
inversion. E.g.

trunc_rnorm <- function(n, mean = 0, sd = 1, lb = 0)
{
lb <- pnorm(lb, mean, sd)
qnorm(runif(n, lb, 1), mean, sd)
}

but I suggest you may rather want samples from a lognormal.

>
>
> > dat
> id x y
> 75 101 0.134 0.1911315
> 79 102 0.170 0.1610306
> 76 103 0.134 0.1911315
> 84 104 0.170 0.1610306
> 74 105 0.134 0.1911315
> 80 106 0.170 0.1610306
> 77 107 0.134 0.1911315
> 81 108 0.170 0.1610306
> 82 109 0.170 0.1610306
> 78 111 0.170 0.1610306
> 83 112 0.170 0.1610306
> 85 113 0.097 0.2777778
> 2 201 1.032 1.5510434
> 1 202 0.803 1.0631001
> 5 203 1.032 1.5510434
>
> mu<-apply(dat[,-1],2,mean)
> sigma<-apply(dat[,-1],2,sd)
> len<-5
> n<-20
> s1<-vector("list",len)
> set.seed(7)
> for(i in 1:len){
> s1[[i]]<-cbind.data.frame(x=rnorm(n*i,mean=mu[1],sd=sigma[1]),
> y=rnorm(n*i,mean=mu[2],sd=sigma[2]))
> }
>
> Thanks for any help,
> Tom
>
>
> ---------------------------------
> S?? efter k??leken!
>
> [[alternative HTML version deleted]]
>
>

-- 
Brian D. Ripley, ripley op stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595

---------------------------------
  Går det långsamt? Skaffa dig en snabbare bredbandsuppkoppling.

---------------------------------
Låna pengar utan säkerhet.

	[[alternative HTML version deleted]]