[R] Gamma Simulation from Exponential Distribution: find sample size given power and sig.Level

Mon Oct 29 21:00:04 CET 2018

If this is homework, then r-help has a no homework policy. I'm assuming that if it is homework then the focus is on statistical concepts, not on R programming.

It looks like your gen_p_vals function should be defined as

    gen_p_vals <- function(reps = n)

instead of n = reps.

Why not just use the rgamma() function to simulate your gamma distributions?
Why not just use the t.test() function do calculate the t test?
But if you really want to calculate the t test manually, use the var() function in the denominator, instead of that x-mean(x) business.
It's unnecessary to use return().
Initializing Z.list outside the gamma.sim function does no good.
Are you using the variable "n" for different purposes in different places? If so, it's confusing.

Although it didn't appear to cause problems this time, sending html formatted email to r-help is generally not a good idea.

(and real names are preferred)

-Don

--
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062
Lab cell 925-724-7509

On 10/28/18, 1:26 PM, "R-help on behalf of Black Yellow" <r-help-bounces using r-project.org on behalf of blackish952 using gmail.com> wrote:

    I built my function to simulate two gamma distributions X and Y based on the
    sum of i.i.d exponential distributions. Assume my code is correct about this
    simulation, I am interested in finding an equal sample size n for X and Y
    such that n can be determined given 90% power and 5% significance level.

    My thought process is that I create a sequence of n's value and use a for
    loop to reiterate each n value in the sequence along with an "if" statement
    saying that if given the n value, 95% of the time when I replicate the
    process, the probability that I reject the null hypothesis given the
    alternative is true is 90%.

    The problem I am facing is that n value is always the maximum within my
    n_seq. I am doubting there must be something wrong when I generate my two
    Gamma distributions.

    Any input is appreciated.

        set.seed(3701)
        n_seq = seq(from = 1, to = 200, by = 1)
        Z.list = numeric(reps)
        var.list = numeric(reps)

        gamma.sim = function(n, mu){
          for(j in 1:5e5){
              u.list = runif(n = n)
              x.list = -mu*log(1-u.list)
              Z.list[j] = sum(x.list)
          }
          return(Z.list)
        }

        # Generate n generations for X.gamma and Y.gamma
        gen_p_vals <- function(n = reps){
          p_vec <- numeric(reps)
          for(ii in 1:reps){
            x <- gamma.sim(n = 79, mu = 1)
            y <- gamma.sim(n = 80, mu = 1)
            t_stat_numerator <- mean(x) - mean(y) - 0 # Test Statistic under
    null hypothesis
            s_p <- sqrt(sum((x-mean(x))^2) + sum((y-mean(y))^2)) /
    (sqrt(2*reps-2))
            t_stat <- t_stat_numerator /(s_p*(sqrt((1/reps)+(1/reps))))
            p_vec[ii] <- 2* pt(-abs(t_stat), 2*reps-2)
          }
          return(p_vec)
        }
        power_pvals <- gen_p_vals(n = 10)
        mean(power_pvals < 0.05)

        for(j in 2:100){
          power_pvals = gen_p_vals(n = n_seq[j])
          if (mean(power_pvals < 0.05) == 0.90){
            print(j)
          }
        }

    	[[alternative HTML version deleted]]

    ______________________________________________
    R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.