--- title: "Mixed Count and Continuous Co-Primary Endpoints" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Mixed Count and Continuous Co-Primary Endpoints} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5 ) ``` ## Overview This vignette demonstrates sample size calculation and power analysis for clinical trials with two co-primary endpoints: an overdispersed count outcome (following negative binomial distribution) and a continuous outcome (following normal distribution). The methodology is based on Homma and Yoshida (2024). ```{r setup, message=FALSE, warning=FALSE} library(twoCoprimary) library(dplyr) library(tidyr) library(knitr) ``` ## Background ### Clinical Context In clinical trials for treating **asthma** and **chronic obstructive pulmonary disease (COPD)**, regulatory agencies require evaluation of both: 1. **Count endpoint**: Number of exacerbations over follow-up period (overdispersed count data) 2. **Continuous endpoint**: Lung function measure such as $\text{FEV}_{1}$ (forced expiratory volume in 1 second) ### Why Negative Binomial Distribution? Count outcomes in clinical trials often exhibit **overdispersion**, where the variance substantially exceeds the mean. The Poisson distribution assumes variance = mean, which is often violated. **Negative binomial (NB) distribution** accommodates overdispersion: $$X \sim \text{NB}(\lambda, \nu)$$ - **Mean**: $\text{E}[X] = \lambda = r \times t$ (rate $\times$ time) - **Variance**: $\text{Var}[X] = \lambda + \lambda^{2}/\nu$ - **Dispersion parameter**: $\nu > 0$ controls overdispersion - As $\nu \to \infty$: $\text{NB} \to \text{Poisson}$ (no overdispersion) - Small $\nu$: High overdispersion (variance $\gg$ mean) **Variance-to-mean ratio (VMR)**: $$\text{VMR} = \frac{\text{Var}[X]}{\text{E}[X]} = 1 + \frac{\lambda}{\nu}$$ When $\text{VMR} > 1$, data are overdispersed and NB is more appropriate than Poisson. ## Statistical Framework ### Model and Assumptions Consider a two-arm superiority trial with sample sizes $n_{1}$ (treatment) and $n_{2}$ (control), with allocation ratio $r = n_{1}/n_{2}$. For subject $i$ in group $j$ ($j = 1$: treatment, $j = 2$: control), we observe two **outcomes**: **Outcome 1 (Count)**: $$X_{i,j,1} \sim \text{NB}(\lambda_{j}, \nu)$$ where $\lambda_{j}$ is the expected number of events in group $j$, and $\nu > 0$ is the common dispersion parameter. **Outcome 2 (Continuous)**: $$X_{i,j,2} \sim \text{N}(\mu_{j}, \sigma^{2})$$ where $\mu_{j}$ is the population mean in group $j$, and $\sigma^{2}$ is the common variance across groups. ### Correlation Structure The correlation between count and continuous outcomes within subjects is measured by $\rho_{j}$ in group $j$. This correlation must satisfy **feasibility constraints** based on the Fréchet-Hoeffding copula bounds, which depend on the marginal distributions. Use the `corrbound2MixedCountContinuous()` function to check valid correlation bounds for given parameters. ### Hypothesis Testing We test superiority of treatment over control for both endpoints. **Note**: In COPD/asthma trials, treatment benefit is indicated by **lower values** for both endpoints (fewer exacerbations, less decline in lung function). **For count endpoint** (1): $$\text{H}_{01}: r_{1} \geq r_{2} \text{ vs. } \text{H}_{11}: r_{1} < r_{2}$$ Equivalently, testing $\beta_{1} = \log(r_{1}) - \log(r_{2}) < 0$. **For continuous endpoint** (2): $$\text{H}_{02}: \mu_{1} - \mu_{2} \geq 0 \text{ vs. } \text{H}_{12}: \mu_{1} - \mu_{2} < 0$$ **Co-primary endpoints** (intersection-union test): $$\text{H}_0 = \text{H}_{01} \cup \text{H}_{02} \text{ vs. } \text{H}_1 = \text{H}_{11} \cap \text{H}_{12}$$ Reject $\text{H}_{0}$ at level $\alpha$ if and only if **both** $\text{H}_{01}$ and $\text{H}_{02}$ are rejected at level $\alpha$. ### Test Statistics **Count endpoint** (Equation 7 in Homma and Yoshida, 2024): $$Z_{1} = \frac{\hat{\beta}_{1}}{\sqrt{\text{Var}(\hat{\beta}_{1})}}$$ where: - $\hat{\beta}_{1} = \log(\bar{X}_{1,1}) - \log(\bar{X}_{2,1})$ is the log rate ratio - $\text{Var}(\hat{\beta}_{1}) = \frac{1}{n_{2}}\left[\frac{1}{t}\left(\frac{1}{\lambda_{2}} + \frac{1}{r\lambda_{1}}\right) + \frac{1+r}{\nu r}\right] = \frac{V_{a}}{n_{2}}$ **Continuous endpoint**: $$Z_{2} = \frac{\bar{X}_{1,2} - \bar{X}_{2,2}}{\sigma\sqrt{\frac{1+r}{r n_{2}}}}$$ When $\sigma$ is a common known standard deviation. ### Joint Distribution and Correlation Under $\text{H}_{1}$, the test statistics $(Z_{1}, Z_{2})$ asymptotically follow a bivariate normal distribution (Appendix B in Homma and Yoshida, 2024): $$\begin{pmatrix} Z_1 \\ Z_2 \end{pmatrix} \sim \text{BN}\left(\begin{pmatrix} \omega_1 \\ \omega_2 \end{pmatrix}, \begin{pmatrix} 1 & \gamma \\ \gamma & 1 \end{pmatrix}\right)$$ where: - $\omega_{1} = \frac{\sqrt{n_{2}}\beta_{1}}{\sqrt{V_{a}}}$ with $\beta_{1} = \log(r_{1}) - \log(r_{2})$ - $\omega_{2} = \frac{\delta}{\sigma\sqrt{\frac{1+r}{r n_{2}}}}$ with $\delta = \mu_{1} - \mu_{2}$ **Correlation between test statistics** (Equation 11 in Homma and Yoshida, 2024): $$\gamma = \sum_{j=1}^{2} \frac{n_{2}\rho_{j}\sqrt{1+\lambda_{j}/\nu}}{n_{j}\sqrt{\lambda_{j}V_{a}(1+r)/r}}$$ For **balanced design** ($r = 1$) with **common correlation** ($\rho_{1} = \rho_{2} = \rho$): $$\gamma = \frac{\rho}{\sqrt{2}}\frac{\sqrt{1/\lambda_{2} + 1/\nu} + \sqrt{1/\lambda_{1} + 1/\nu}}{\sqrt{(1/\lambda_{2} + 1/\lambda_{1} + 2/\nu)}}$$ ### Power Calculation The overall power is (Equation 10 in Homma and Yoshida, 2024): $$1 - \beta = \text{P}(Z_{1} < z_{\alpha} \cap Z_{2} < z_{\alpha} \mid \text{H}_{1})$$ Using the bivariate normal CDF $\Phi_{2}$: $$1 - \beta = \Phi_{2}\left(z_{\alpha} - \frac{\sqrt{n_{2}}(\log r_{1} - \log r_{2})}{\sqrt{V_{a}}}, z_{\alpha} - \frac{\sqrt{n_{2}}(\mu_{1}-\mu_{2})}{\sigma\sqrt{\frac{1+r}{r}}} \Bigg| \gamma\right)$$ ### Sample Size Determination The sample size is determined by solving the power equation numerically. For a given allocation ratio $r$, target power $1 - \beta$, and significance level $\alpha$, we find the smallest $n_{2}$ such that the overall power equals or exceeds $1 - \beta$. **Sequential search algorithm**: 1. Calculate initial sample size based on single-endpoint formulas 2. Compute power at current sample size 3. If power $\geq$ target: decrease $n_{2}$ until power $<$ target, then add 1 back 4. If power $<$ target: increase $n_{2}$ until power $\geq$ target ## Correlation Bounds ### Example: Calculate Correlation Bounds ```{r correlation_bounds} # Scenario: lambda = 1.25, nu = 0.8, mu = 0, sigma = 250 bounds1 <- corrbound2MixedCountContinuous(lambda = 1.25, nu = 0.8, mu = 0, sd = 250) cat("Correlation bounds for NB(1.25, 0.8) and N(0, 250²):\n") cat("Lower bound:", round(bounds1[1], 3), "\n") cat("Upper bound:", round(bounds1[2], 3), "\n\n") # Higher dispersion (smaller nu) typically restricts bounds bounds2 <- corrbound2MixedCountContinuous(lambda = 1.25, nu = 0.5, mu = 0, sd = 250) cat("Correlation bounds for NB(1.25, 0.5) and N(0, 250²):\n") cat("Lower bound:", round(bounds2[1], 3), "\n") cat("Upper bound:", round(bounds2[2], 3), "\n\n") # Higher mean count bounds3 <- corrbound2MixedCountContinuous(lambda = 2.0, nu = 0.8, mu = 0, sd = 250) cat("Correlation bounds for NB(2.0, 0.8) and N(0, 250²):\n") cat("Lower bound:", round(bounds3[1], 3), "\n") cat("Upper bound:", round(bounds3[2], 3), "\n") ``` **Important**: Always verify that the specified correlation $\rho$ is within the feasible bounds for your parameters. ## Replicating Homma and Yoshida (2024) Table 1 (Case B) Table 1 from Homma and Yoshida (2024) shows sample sizes and operating characteristics for various scenarios. We replicate **Case B** with $\nu = 3$ and $\nu = 5$. **Design parameters for Case B**: - Count rates: $r_{2} = 1$, $r_{1} = 2$, $t = 1$ → $\lambda_{2} = 1$, $\lambda_{1} = 2$ - Dispersion: $\nu = 3$ and $5$ - Continuous means: $\mu_{2} = 0$, $\mu_{1} = -50$ (negative indicates less decline) - Standard deviation: $\sigma = 75$ - $\alpha = 0.025$ (one-sided), $1 - \beta = 0.9$ (target power) - Balanced allocation: $r = 1$ ($n_{1} = n_{2}$) ```{r table1_case_B} # Define scenarios for Table 1 Case B scenarios_table1_B <- expand.grid( nu = c(3, 5), rho = c(0, 0.2, 0.4, 0.6, 0.8), stringsAsFactors = FALSE ) # Calculate sample sizes for each scenario results_table1_B <- lapply(1:nrow(scenarios_table1_B), function(i) { nu_val <- scenarios_table1_B$nu[i] rho_val <- scenarios_table1_B$rho[i] result <- ss2MixedCountContinuous( r1 = 1, r2 = 2, nu = nu_val, t = 1, mu1 = -50, mu2 = 0, sd = 75, r = 1, rho1 = rho_val, rho2 = rho_val, alpha = 0.025, beta = 0.1 ) data.frame( nu = nu_val, rho = rho_val, n2 = result$n2, n1 = result$n1, N = result$N ) }) table1_B_results <- bind_rows(results_table1_B) # Display results grouped by nu table1_B_nu3 <- table1_B_results %>% filter(nu == 3) %>% select(rho, n2, N) table1_B_nu5 <- table1_B_results %>% filter(nu == 5) %>% select(rho, n2, N) kable(table1_B_nu3, caption = "Table 1 Case B: Sample Sizes (ν = 3, Balanced Design, α = 0.025, 1-β = 0.9)", digits = 1, col.names = c("ρ", "n per group", "N total")) kable(table1_B_nu5, caption = "Table 1 Case B: Sample Sizes (ν = 5, Balanced Design, α = 0.025, 1-β = 0.9)", digits = 1, col.names = c("ρ", "n per group", "N total")) ``` **Observations**: - Higher dispersion parameter $\nu$ (less overdispersion) requires **smaller** sample sizes - Correlation reduces sample size: approximately 2-4% at $\rho = 0.4$, 5-8% at $\rho = 0.8$ - The effect of correlation is moderate compared to other endpoint combinations ## Basic Usage Examples ### Example 1: Balanced Design Calculate sample size for a balanced design with moderate effect sizes: ```{r example1} # Balanced design: n1 = n2 (i.e., r = 1) result_balanced <- ss2MixedCountContinuous( r1 = 1.0, # Count rate in treatment group r2 = 1.25, # Count rate in control group nu = 0.8, # Dispersion parameter t = 1, # Follow-up time mu1 = -50, # Mean for treatment (negative = benefit) mu2 = 0, # Mean for control sd = 250, # Standard deviation r = 1, # Balanced allocation rho1 = 0.5, # Correlation in treatment group rho2 = 0.5, # Correlation in control group alpha = 0.025, beta = 0.2 ) print(result_balanced) ``` ### Example 2: Effect of Correlation Demonstrate how correlation affects sample size: ```{r example2} # Fixed effect sizes r1 <- 1.0 r2 <- 1.25 nu <- 0.8 mu1 <- -50 mu2 <- 0 sd <- 250 # Range of correlations rho_values <- c(0, 0.2, 0.4, 0.6, 0.8) ss_by_rho <- sapply(rho_values, function(rho) { result <- ss2MixedCountContinuous( r1 = r1, r2 = r2, nu = nu, t = 1, mu1 = mu1, mu2 = mu2, sd = sd, r = 1, rho1 = rho, rho2 = rho, alpha = 0.025, beta = 0.2 ) result$n2 }) result_df <- data.frame( rho = rho_values, n_per_group = ss_by_rho, N_total = ss_by_rho * 2, reduction_pct = round((1 - ss_by_rho / ss_by_rho[1]) * 100, 1) ) kable(result_df, caption = "Effect of Correlation on Sample Size", col.names = c("ρ", "n per group", "N total", "Reduction (%)")) # Plot plot(rho_values, ss_by_rho, type = "b", pch = 19, xlab = "Correlation (ρ)", ylab = "Sample size per group (n)", main = "Effect of Correlation on Sample Size", ylim = c(600, max(ss_by_rho) * 1.1)) grid() ``` **Interpretation**: Higher positive correlation reduces required sample size. At $\rho = 0.8$, sample size is reduced by approximately 5-7% compared to $\rho = 0$. ### Example 3: Effect of Dispersion Parameter Compare sample sizes for different levels of overdispersion: ```{r example3} # Fixed design parameters r1 <- 1.0 r2 <- 1.25 mu1 <- -50 mu2 <- 0 sd <- 250 rho <- 0.5 # Range of dispersion parameters nu_values <- c(0.5, 0.8, 1.0, 2.0, 5.0) ss_by_nu <- sapply(nu_values, function(nu) { result <- ss2MixedCountContinuous( r1 = r1, r2 = r2, nu = nu, t = 1, mu1 = mu1, mu2 = mu2, sd = sd, r = 1, rho1 = rho, rho2 = rho, alpha = 0.025, beta = 0.2 ) result$n2 }) result_df_nu <- data.frame( nu = nu_values, VMR = round(1 + 1.125/nu_values, 2), # VMR at lambda = 1.125 (average) n_per_group = ss_by_nu, N_total = ss_by_nu * 2 ) kable(result_df_nu, caption = "Effect of Dispersion Parameter on Sample Size", digits = c(1, 2, 0, 0), col.names = c("ν", "VMR", "n per group", "N total")) ``` **Key finding**: Higher overdispersion (smaller $\nu$, larger VMR) requires larger sample sizes. ### Example 4: Unbalanced Allocation Calculate sample size with 2:1 allocation ratio: ```{r example4} # Balanced design (r = 1) result_balanced <- ss2MixedCountContinuous( r1 = 1.0, r2 = 1.25, nu = 0.8, t = 1, mu1 = -50, mu2 = 0, sd = 250, r = 1, rho1 = 0.5, rho2 = 0.5, alpha = 0.025, beta = 0.2 ) # Unbalanced design (r = 2, i.e., n1 = 2*n2) result_unbalanced <- ss2MixedCountContinuous( r1 = 1.0, r2 = 1.25, nu = 0.8, t = 1, mu1 = -50, mu2 = 0, sd = 250, r = 2, rho1 = 0.5, rho2 = 0.5, alpha = 0.025, beta = 0.2 ) comparison_allocation <- data.frame( Design = c("Balanced (1:1)", "Unbalanced (2:1)"), n_treatment = c(result_balanced$n1, result_unbalanced$n1), n_control = c(result_balanced$n2, result_unbalanced$n2), N_total = c(result_balanced$N, result_unbalanced$N) ) kable(comparison_allocation, caption = "Comparison: Balanced vs Unbalanced Allocation", col.names = c("Design", "n₁", "n₂", "N total")) cat("\nIncrease in total sample size:", round((result_unbalanced$N - result_balanced$N) / result_balanced$N * 100, 1), "%\n") ``` ## Power Verification Verify that calculated sample sizes achieve target power: ```{r power_verification} # Use result from Example 1 power_result <- power2MixedCountContinuous( n1 = result_balanced$n1, n2 = result_balanced$n2, r1 = 1.0, r2 = 1.25, nu = 0.8, t = 1, mu1 = -50, mu2 = 0, sd = 250, rho1 = 0.5, rho2 = 0.5, alpha = 0.025 ) cat("Target power: 0.80\n") cat("Achieved power (Count endpoint):", round(as.numeric(power_result$power1), 4), "\n") cat("Achieved power (Continuous endpoint):", round(as.numeric(power_result$power2), 4), "\n") cat("Achieved power (Co-primary):", round(as.numeric(power_result$powerCoprimary), 4), "\n") ``` ## Practical Recommendations ### Design Considerations 1. **Estimating dispersion parameter**: Use pilot data or historical studies to estimate $\nu$. Underestimating $\nu$ (overestimating overdispersion) leads to conservative sample sizes. 2. **Estimating correlation**: Use pilot data; be conservative if uncertain ($\rho = 0$ is conservative). 3. **Direction of benefit**: For COPD/asthma trials, ensure test directions are correct (lower is better for both endpoints). 4. **Balanced allocation**: Generally most efficient ($r = 1$) unless practical constraints require otherwise. 5. **Sensitivity analysis**: Calculate sample sizes for range of plausible $\nu$, $\rho$, and effect sizes. ### When to Use This Method Use mixed count-continuous methods when: - One endpoint is an overdispersed count (exacerbations, events) - Other endpoint is continuous (lung function, quality of life) - Both endpoints are clinically meaningful co-primary endpoints - Negative binomial distribution is appropriate for count data ### Challenges and Considerations 1. **Overdispersion estimation**: Requires historical data or pilot studies; misspecification affects sample size 2. **Correlation estimation**: Correlation between count and continuous outcomes is challenging to estimate 3. **Direction specification**: Ensure treatment benefit direction is correctly specified for both endpoints ## References Homma, G., & Yoshida, T. (2024). Sample size calculation in clinical trials with two co-primary endpoints including overdispersed count and continuous outcomes. *Pharmaceutical Statistics*, 23(1), 46-59.