---
title: "Hypothesis test for a proportion"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Hypothesis test for a proportion}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,comment=NA,fig.width=7,fig.height=5)
library(interpretCI)
library(glue)
```

```{r,echo=FALSE,message=FALSE}
x=propCI(n=100,p=0.73,P=0.8,alpha=0.01)

two.sided<-greater<-less<-FALSE
if(x$result$alternative=="two.sided") two.sided=TRUE
if(x$result$alternative=="less") less=TRUE
if(x$result$alternative=="greater") greater=TRUE

twoS="The null hypothesis will be rejected if the sample proportion is too big or if it is too small."
lessS="The null hypothesis will be rejected if the sample proportion is too small."
greaterS="The null hypothesis will be rejected if the sample proportion is too big."
```

This document is prepared automatically using the following R command.

```{r,echo=FALSE}
call=paste0(deparse(x$call),collapse="")
x1=paste0("library(interpretCI)\nx=",call,"\ninterpret(x)")
textBox(x1,italic=TRUE,bg="grey95",lcolor="grey50")
```

## Problem


```{r,echo=FALSE}

string=glue("The CEO of a large electric utility claims that {(1-x$result$P)*100} percent of his {x$result$n*100} customers are very satisfied with the service they receive. To test this claim, the local newspaper surveyed {x$result$n} customers, using simple random sampling. Among the sampled customers, {(1-x$result$p)*100} percent say they are very satisified. Based on these findings, can we reject the CEO's hypothesis that {(1-x$result$P)*100}% of the customers are very satisfied? Use a {x$result$alpha} level of significance.")

textBox(string)
```


## Confidence interval of a sample proportion

The approach that we used to solve this problem is valid when the following conditions are met.

- The sampling method must be **simple random sampling**. This condition is satisfied; the problem statement says that we used simple random sampling.

- Each sample point can result in just two possible outcomes. We call one of these outcomes a success and the other, a failure.

- The sample should include at least 10 successes and 10 failures. Suppose we classify a "more local news" response as a success, and any other response as a failure. Then, we have `r x$result$p` $\times$ `r x$result$n` = `r x$result$p*x$result$n` successes, and `r 1-x$result$p` $\times$ `r x$result$n` = `r (1-x$result$p)*x$result$n` failures - plenty of successes and failures.

- The population size is at least 20 times as big as the sample size. If the population size is much larger than the sample size, we can use an **approximate** formula for the standard deviation or the standard error. This condition is satisfied, so we will use one of the simpler **approximate** formulas.


### Solution

## This approach consists of four steps: 

- state the hypotheses

- formulate an analysis plan

- analyze sample data

- interpret results.


### 1. State the hypotheses 

The first step is to state the null hypothesis and an alternative hypothesis.

$$Null\ hypothesis(H_0): P `r ifelse(two.sided,"=",ifelse(less,">=","<="))` `r x$result$P`$$
$$Alternative\ hypothesis(H_1): P `r ifelse(two.sided, "\\neq" ,ifelse(less,"<",">"))` `r x$result$P`$$

Note that these hypotheses constitute a `r ifelse(two.sided,"two","one")`-tailed test. `r ifelse(two.sided,twoS,ifelse(less,lessS,greaterS))`.

### 2. Formulate an analysis plan

For this analysis, the significance level is `r x$result$alpha``. The test method, shown in the next section, is a **one-sample z-test**.

### 2. Select a confidence level. 

In this analysis, the confidence level is defined for us in the problem. We are working with a `r (1-x$result$alpha)*100`% confidence level.

### 3. Analyze sample data

Using sample data, we calculate the standard deviation (sd) and compute the z-score test statistic (z).

$$sd=\sqrt{\frac{P\times(1-P)}{n}}$$
$$sd=\sqrt{\frac{`r x$result$P`\times(1-`r x$result$P`)}{`r x$result$n`}}=`r x$result$se`$$
$$z=\frac{p-P}{sd}=\frac{`r x$result$p`-`r x$result$P`}{`r x$result$se`}=`r x$result$z`$$
where $P$ is the hypothesized value of population proportion in the null hypothesis, $p$ is the sample proportion, and $n$ is the sample size.

Since we have a `r ifelse(two.sided,"two","one")`-tailed test, the P-value is the probability that the z statistic is `r if(!greater) "less than"` `r if(!greater) round(-abs(x$result$z),2)` `r if(!less) "or greater than "` `r if(!less) round(abs(x$result$z),2)`.

We can use following R code to find the p value.

```{r,echo=FALSE}

if(two.sided){
               string=glue("pnorm(-abs({x$result$z}))\\times2")
} else if(greater){
               string=glue("pnorm({x$result$z},lower.tail=FALSE)")
} else{
               string=glue("pnorm({x$result$z})")
          }
```

$$p=`r string`=`r round(x$result$pvalue,3)`$$


Alternatively,we can use the Normal Distribution curve to find p value.

```{r}
draw_n(z=x$result$z,alternative=x$result$alternative)
```


### 4. Interpret results. 

Since the P-value (`r round(x$result$pvalue,3)`) is `r ifelse(x$result$pvalue>x$result$alpha,"greater","less")` than the significance level (`r x$result$alpha`), we can`r if(x$result$pvalue>x$result$alpha) "not"` reject the null hypothesis.


### Result of propCI()

```{r,echo=FALSE}
print(x)
```

### Reference

The contents of this document are modified from StatTrek.com.
Berman H.B., "AP Statistics Tutorial", [online] Available at:  https://stattrek.com/hypothesis-test/proportion.aspx?tutorial=AP URL[Accessed Data: 1/23/2022].