[R] proportions confidence intervals

Mon Jul 12 19:37:55 CEST 2004

Darren Shaw wrote:

> this may be a simple question - but i would appreciate any thoughts
> 
> does anyone know how you would get one lower and one upper confidence 
> interval for a set of data that consists of proportions.  i.e. taking a 
> usual confidence interval for normal data would result in the lower 
> confidence interval being negative - which is not possible given the data 
> (which is constrained between 0 and 1)
> 
> i can see how you calculate a upper and lower confidence interval for a 
> single proportion, but not for a set of proportions

(1) Your question appears to be a bit ``off topic''.  I.e. it is
really about statistical methodology, rather than about how to
implement methodology in R.

(2) You need to make the scenario clearer.  What do your data
actually consist of?  What are you assuming?

The only reasonable scenario that springs to mind (perhaps this is
merely indicative of poverty of imagination on my part) is that you
have a number of ***independent*** samples, each yielding a sample
proportion, and each coming from the same population (or at least
from populations having the same population proportion ``p''.  I.e.
you have p.hat_1, ..., p.hat_n and from these you wish to calculate a
confidence interval for p.

You need to know the sample ***sizes*** for each sample.  If you
don't, you're screwed.  Full stop.  There is absolutely nothing
sensible you can do.  If you ***do*** know the sample sizes (say k_1,
..., k_n) then the problem is trivial.

You have p.hat_j = x_j/k_j for j = 1, ..., n.

Let x = x_1 + ... + x_n  and k = k_1 + ... + k_n.

Form p.hat = x/k.  (I.e. you ***really*** just have one big
happy sample.)  Then calculate the confidence interval for p
in the usual way:

	p.hat +/- (z-value) * sqrt(p.hat * (1 - p.hat)/k)

If this is not the scenario with which you need to cope, then
you'll have to explain what that scenario actually is.

				cheers,

					Rolf Turner
					rolf at math.unb.ca