[R] Confidence Interval

Wed Aug 20 05:16:13 CEST 2008

On Tue, 19 Aug 2008, Raphael Saldanha wrote:

> Hi!
>
> With the following script, I'm trying to make a demonstration of a
> Confidence Interval, but I'm observing some differences on tails.

You need to tell us what you are trying to do.  You seem to be computing a 
parametric bootstrap interval, but incorrectly (you need to reflect 
percentile intervals).  See Davison & Hinkley (1997) 'The Bootstrap and 
its Application' for more details.

In any case, your simulation is not repeatable (you set no seed), so we 
don't know what you saw and what 'differences' disturbed you.

Your calculation of quantiles is not quite correct: use quantile().
(The indices go from 1 to 1000, not 0 to 1000.)  E.g.

> quantile(1:1000, c(0.025, 0.975))
    2.5%   97.5%
  25.975 975.025

and not 25, 975.

As your results show, a histogram is not a good summary of the shape of a 
distribution on 1000 points.  We can do much better with an ecdf or a 
density estimate.

>
> # Teste de m?dia entre uma amostra e uma popula??o normal
> # Autor: Raphael de Freitas Saldanha
> # Agosto de 2008
>
> n    <- 200    # Sample size
> xbar <- 100    # Sample mean
> s    <- 2      # Sample SD
> nc   <- 0.95   # Confidence level (95% -> 0.95)
> rep  <- 1000   # Loops
>
> #######################################################################
>
> y <- NULL                # Vetor com as m?dias da amostra
> for (i in 1:rep){        # Loop
> x <- rnorm(n,xbar,s)     # Gere uma amostra normal n elementos, xbar m?dia e
> s desvio-padr?o
> x <- mean(x)             # Calcule a m?dia (exata) dessa amostra
> y <- c(y,x)              # Coloque essa m?dia em um registro em y
> }
>
> y <- sort(y)             # Ordene todas as m?dias geradas
>
> LI <- y[((1-nc)/2)*rep]         # Limite inferior,
> LS <- y[rep-(((1-nc)/2)*rep)]   # Limite superior
>
> ### PARTE GR?FICA ###
>
> x <- mean(y)
>
> xvals <- seq(-LI, LI, length.out=5000)
> dvals <- dnorm(xvals,mean(y), sd(y))[1:5000]
>
> xbvals <- seq(LS, LS*2, length.out=5000)
> dbvals <- dnorm(xbvals,mean(y), sd(y))[1:5000]
>
> ahist <- hist(y, freq=FALSE, col="lightblue", main="Intervalo de confian?a")
>
> polygon(c(xvals,LI,LI), c(dvals,dnorm(LI,mean(y), sd(y)),min(dvals)),
> col="orange", lty=0)
> polygon(c(LS,LS,xbvals), c(min(dbvals),dnorm(LS,mean(y), sd(y)),dbvals),
> col="orange", lty=0)
> curve(dnorm(x,mean(y), sd(y)),add=TRUE, lty=1, col="red",lwd=2)
>
> ### Intervalo de Confian?a ###
>
> LI # Limite inferior
> LS # Limite superior
>
> -- 
> Atenciosamente,
>
> Raphael Saldanha
> saldanha.plangeo at gmail.com
>
> 	[[alternative HTML version deleted]]
>
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595