[R] confidence interval of a average...

Robert W. Baer, Ph.D. rbaer at atsu.edu
Thu Nov 25 03:05:37 CET 2004


> Sorry if this was not clear.  This is more of a theoreticla question 
> rather than a R-coding question.  I need to calculate
>
> "The predicted response and 95% prediction interval for a man of average 
> height"
>
> So I need to predict the average response, which is easily done by taking 
> the mean height and using the regression formula.
>
> However, "average height" has to be calculated from the sample, and thus I 
> have confidence in that.  Let's say the mean is 163cm, I think that I 
> can't take the 163cm value and calculate the CI from just the sd of the 
> lung capacity because that would be too narrow; I think covariance must 
> come into it somehow, or can I just do a 97.5% CI on the height and take 
> those extreme values and do a 97.% CI on them?

Then, you want the predition interval on the mean VC which is the thighter 
of the two confidence intervals and does not include the extra variability 
of VC about its mean.  As always with confidence intevals, you are free to 
look at either 95% CI or 97.5% CI depending on what kind of satement you'd 
like to make about your confidence.  I don't not understand you comment 
about covariance at all.

Let me try again with data in your units.  Note that CI varies with height 
and is smallest at the mean height whether you are talking about CI on the 
mean VC or CI on the predicted VC.  For comparison, the red lines are the 
95% CI on mean regression fit VC and the blue lines are 95% CI on 
"predicted" VC.   The simulated data is set to have a mean height that 
varies around 163 cm.


# Make simulated data with mean height near 163
# vc approximately in liter values with scatter
height=sort(rnorm(50,mean=163,sd=35))
vc=0.03*height+.5*rnorm(50)

#Plot the simulated data
plot(vc~height,ylab='vital capacity (l)',xlab='Height (cm)')

# Set up data frame with values of height you wish a ci on
# column heading must be same as for lm() fit x variable
# in this case, dataframe contains only mean height
mean.height.fit.ci=data.frame(height=mean(height))

#print out the mean height
mean.height.fit.ci

# fit the regression model
vc.lm=lm(vc~height)

#Draw 95% confidence intervals on mean vc at various heights(red) (min at 
mean(height)
matlines(height,predict.lm(vc.lm,interval="c"),lty=c(1,2,2), 
col=c('black','red','red'))

#Draw 95% confidence intervals on new vc at various heights(blue) (min again 
at mean(height)
matlines(height,predict.lm(vc.lm,interval="p"),lty=c(1,3,3), 
col=c('black','blue','blue'))

# Determine 95% CI on mean vc at mean height
predict.lm(vc.lm,mean.height.fit.ci,interval="confidence")

# Determine 97.5 5% CI on mean vc at mean height
predict.lm(vc.lm,mean.height.fit.ci,interval="confidence", level=0.975)


You might wish to read a little more about regression CIs in a good 
statistics book.

HTH,
Rob




More information about the R-help mailing list