Power Evaluation with Spending bounds

Overview

This vignette covers how to compute power or Type I error for a design derived with a spending bound. We will write this with a general non-constant treatment effect using gs_design_npe() to derived the design under one parameter setting and computing power under another setting. We will use a trial with a binary endpoint to enable a full illustration.

Scenario for consideration

We consider a scenario largely based on the CAPTURE study Investigators et al. (1997) where the primary endpoint was a composite of death, acute myocardial infarction or the need for recurrent percutaneous intervention within 30 days of randomization. That is, we consider a 2-arm trial with an experimental arm and a control arm. The primary endpoint for the trial is a binary indicator for each participant if they have a failed outcome. For this case, we consider the parameter \(\theta = p_1 - p _2\) where \(p_1\) denotes the probability that a trial participant in the control group experiences a failure and \(p_2\) represents the same probability for a trial participant in the experimental group. We will assume \(K=3\) analyses after 350, 700, and 1400 patients have been observed with equal randomization between the treatment groups. The study was designed with approximately 80% power (Type II error \(\beta = 1 - 0.8 = 0.2\)) and 2.5% one-sided Type I error (\(\alpha = 0.025\)) to detect a reduction from a 15% event rate (\(p_1 = 0.15\)) in the control group to 10% (\(p_2 = 0.1\)) in the experimental group. The parameter of interest is \(\theta = p_1 - p_2\). We denote the alternate hypothesis as \[H_1: \theta = \theta_1= p_1^1 - p_2^1 = 0.15 - 0.10 = 0.05\] and the null hypothesis \[H_0: \theta = \theta_0 = 0 = p_1^0 - p_2^0\] where \(p^0_1 = p^0_2= (p_1^1+p_2^1)/2 = 0.125\) as laid out in Lachin (2009).

library(gsdmvn)

p0 <- 0.15 # assumed failure rate in control group
p1 <- 0.10 # assumed failure rate in experimental group
alpha <- 0.025 # Design Type I error
beta <- 0.2 # Design Type II error for 80% power

We note that had we considered a success outcome such as objective response in an oncology study, we would let \(p_1\) denote the experimental group and \(p_2\) the control group response rate. Thus, we always set up the notation so the \(p_1>p_2\) represents superiority for the experimental group.

Notation and Statistical Testing

We let \(X_{ij}\sim \hbox{Bernoulli}(p_i),\) \(j=1,2,\ldots,n_{ik}\) denote independent random variables in the control (\(i=1\)) and experimental (\(i=2\)) groups through analysis \(k=1,2,\ldots,K\) where \(n_{i1} < n_{i2}<\ldots<n_K\). We let \(Y_{ik}=\sum_{j=1}^{n_k}X_{ij}\). At analysis \(k=1,2,\ldots\) and for \(j=1,2\) we denote the proportion of failures in group \(i\) at analysis \(k\)

\[ \hat{p}_{ik}=Y_{ik}/n_{ik}.\] Estimating under null hypothesis \(H_0: p_1=p_2\equiv p_0\) we estimate

\[ \hat{p}_{0k}=\frac{Y_{1k}+ Y_{2k}}{n_{1k}+ n_{2k}}= \frac{n_{1k}\hat p_{1k} + n_{2k}\hat p_{2k}}{n_{1k} + n_{2k}}. \] We note

\[\hbox{Var}(\hat p_{ik})=\frac{p_{i}(1-p_i)}{n_{ik}},\] and its consistent estimatator \[\widehat{\hbox{Var}}(\hat p_{ik})=\frac{\hat p_{ik}(1-\hat p_{ik})}{n_{ik}},\] \(i=1,2\), \(k=1,2,\ldots,K\). Letting \(\hat\theta_k=\hat p_{1k}-\hat p_{2k},\) we also have

\[\sigma^2_k\equiv \hbox{Var}(\hat\theta_i)=\frac{p_1(1-p_1)}{n_{1k}}+\frac{p_2(1-p_2)}{n_{2k}},\] its consistent estimator \[\hat\sigma^2_k=\frac{\hat p_{1k}(1-\hat p_{1k})}{n_{1k}}+\frac{\hat p_{2k}(1-\hat p_{2k})}{n_{2k}},\] and the corresponding null hypothesis estimator \[\hat\sigma^2_{0k}=\hat p_{0k}(1-\hat p_{0k})\left(\frac{1}{n_{1k}}+ \frac{1}{n_{2k}}\right),\] \(k=1,2,\ldots,K\). Statistical information for each of these quantities and their corresponding estimators are denoted by

\[\begin{align} \mathcal{I}_k = &1/\sigma^2_k,\\ \mathcal{\hat I}_k = &1/\hat \sigma^2_k,\\ \mathcal{I}_{0k} =& 1/ \sigma^2_{0k},\\ \mathcal{\hat I}_{0k} =& 1/\hat \sigma^2_{0k}, \end{align}\] \(k=1,2,\ldots,K\). Testing, as recommended by Lachin (2009), is done with the large sample test with the null hypothesis variance estimate and without continuity correction: \[Z_k = \hat\theta_k/\hat\sigma_{0k}=\frac{\hat p_{1k} - \hat p_{2k}}{\sqrt{(1/n_{1k}+ 1/n_{2k})\hat p_{0k}(1-\hat p_{0k})} },\] which is asymptotically Normal(0,1) if \(p_1=p_2\) and Normal(0, \(\sigma_{0k}^2/\sigma_k^2\)) more generally for any \(p_1, p_2\), \(k=1,2,\ldots,K\). We note that \(\chi^2=Z^2_k\) is the \(\chi^2\) test without continuity correction as recommended by Gordon and Watson (1996). Note finally that this extends in a straightforward way the non-inferiority test of Farrington and Manning (1990) if the null hypothesis is \(\theta = p_1 - p_2 - \delta = 0\) for some non-inferiority margin \(\delta > 0\); \(\delta < 0\) would correspond to what is referred to as super-superiority Chan (2002), requiring that experimental therapy has been shown to be superior to control by at least a margin \(-\delta>0\).

Power Calculations

We begin with a fixed design to simplify. We follow the approach of Lachin (2009) to compute power accounting for the different variance (statistical information) computations under the null hypothesis noted in the previous section. Noting the asymptotic equivalence

\[Z_k\approx \hat\theta_k/\sigma_{0k}=\frac{\hat p_{1k} - \hat p_{2k}}{\sqrt{(1/n_{1k}+ 1/n_{2k})p_{0}(1- p_0)} }.\] We denote \(n_k=n_{1k}+n_{2k},\) \(k=1,2,\ldots, K\) and assume a constant proportion \(\xi_i\) randomized to each group \(i=1,2.\) Thus,

\[Z_k\approx \frac{\sqrt{n_k}(\hat p_{1k} - \hat p_{2k})}{\sqrt{(1/\xi_1+ 1/\xi_2)p_{0}(1- p_0)} }.\]

we have the asymptotic distribution

\[Z_k\sim\hbox{Normal}\left(\sqrt{n_k}\frac{p_1 - p_2}{\sqrt{(1/\xi_1+ 1/\xi_2) p_0(1- p_0)} },\sigma^2_{0k}/\sigma^2_{1k}\right).\] We note that

\[ \sigma^2_{0k}/\sigma^2_{1k} = \frac{ p_0(1-p_0)\left(1/\xi_1+ 1/\xi_2\right)}{p_1(1-p_1)/\xi_1+p_2(1-p_2)/\xi_2}.\] We also note by definition that \(\sigma^2_{0k}/\sigma^2_{1k}=\mathcal I_k/\mathcal I_{0k}.\) Based on an input \(p_1, p_2, \xi_1, \xi_2 = 1-\xi_1, n_k\) we will compute \(\theta, \mathcal{I}_k, \mathcal{I}_{0k}\), \(k=1,2,\ldots,K\).

library(tibble)

gs_info_binomial <- function(p1, p2, xi1, n, delta = NULL){
  if (is.null(delta)) delta <- p1 - p2
  # Compute (constant) effect size at each analysis theta
  theta <- rep(p1 - p2, length(n))
  # compute null hypothesis rate, p0
  p0 <- xi1 * p1 + (1 - xi1) * p2
  # compute information based on p1, p2
  info <-  n / (p1 * (1 - p1) / xi1 + p2 * (1 - p2) / (1 - xi1))
  # compute information based on null hypothesis rate of p0
  info0 <- n / (p0 * (1 - p0)*(1 / xi1 + 1 / (1 - xi1)))
  # compute information based on H1 rates of p1star, p2star
  p1star <- p0 + delta * xi1
  p2star <- p0 - delta * (1 - xi1)
  info1 <-  n / (p1star * (1 - p1star) / xi1 + p2star * (1 - p2star) / (1 - xi1))
  return(tibble(Analysis = 1:length(n),
                n = n,
                theta = theta,
                theta1 = rep(delta, length(n)),
                info = info,
                info0 = info0,
                info1 = info1))
}

For the CAPTURE trial, we have

h1 <- gs_info_binomial(p1 = .15, p2 = .1, xi1 = .5, n = c(350, 700, 1400))
h1
#> # A tibble: 3 × 7
#>   Analysis     n theta theta1  info info0 info1
#>      <int> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
#> 1        1   350  0.05   0.05  805.   800  805.
#> 2        2   700  0.05   0.05 1609.  1600 1609.
#> 3        3  1400  0.05   0.05 3218.  3200 3218.

Now we examine information for a smaller assumed treatment difference than the alternative:

h <- gs_info_binomial(p1 = .15, p2 = .12, xi1 = .5, delta = .05, n = c(350, 700, 1400))
h
#> # A tibble: 3 × 7
#>   Analysis     n theta theta1  info info0 info1
#>      <int> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
#> 1        1   350  0.03   0.05  751.  749.  753.
#> 2        2   700  0.03   0.05 1502. 1499. 1507.
#> 3        3  1400  0.03   0.05 3003. 2997. 3013.

We can plug these into gs_power_npe() with the intended spending functions. We begin with power under the alternate hypothesis

gs_power_npe(theta = h1$theta, theta1 = h1$theta, info = h1$info,
             info0 = h1$info0, info1 = h1$info1,
             upper = gs_spending_bound,
             upar = list(sf = gsDesign::sfLDOF, total_spend = 0.025),
             lower = gs_spending_bound,
             lpar = list(sf = gsDesign::sfHSD, param = -2, total_spend = 0.2)
)
#> # A tibble: 6 × 9
#>   Analysis Bound      Z Probability theta theta1  info info0 info1
#>      <int> <chr>  <dbl>       <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
#> 1        1 Upper  4.33      0.00178  0.05   0.05  805.   800  805.
#> 2        2 Upper  2.96      0.169    0.05   0.05 1609.  1600 1609.
#> 3        3 Upper  1.97      0.794    0.05   0.05 3218.  3200 3218.
#> 4        1 Lower -0.629     0.0203   0.05   0.05  805.   800  805.
#> 5        2 Lower  0.295     0.0538   0.05   0.05 1609.  1600 1609.
#> 6        3 Lower  1.94      0.200    0.05   0.05 3218.  3200 3218.

gs_power_npe(theta = h$theta, theta1 = h$theta1, info = h$info,
             info0 = h$info0, info1 = h$info1,
             upper = gs_spending_bound,
             upar = list(sf = gsDesign::sfLDOF, total_spend = 0.025),
             lower = gs_spending_bound,
             lpar = list(sf = gsDesign::sfHSD, param = -2, total_spend = 0.2)
)
#> # A tibble: 6 × 9
#>   Analysis Bound      Z Probability theta theta1  info info0 info1
#>      <int> <chr>  <dbl>       <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
#> 1        1 Upper  4.33     0.000224  0.03   0.05  751.  749.  753.
#> 2        2 Upper  2.96     0.0359    0.03   0.05 1502. 1499. 1507.
#> 3        3 Upper  1.97     0.364     0.03   0.05 3003. 2997. 3013.
#> 4        1 Lower -0.675    0.0672    0.03   0.05  751.  749.  753.
#> 5        2 Lower  0.230    0.195     0.03   0.05 1502. 1499. 1507.
#> 6        3 Lower  1.85     0.594     0.03   0.05 3003. 2997. 3013.

References

Chan, Ivan SF. 2002. “Power and Sample Size Determination for Noninferiority Trials Using an Exact Method.” Journal of Biopharmaceutical Statistics 12 (4): 457–69.

Farrington, Conor P, and Godfrey Manning. 1990. “Test Statistics and Sample Size Formulae for Comparative Binomial Trials with Null Hypothesis of Non-Zero Risk Difference or Non-Unity Relative Risk.” Statistics in Medicine 9 (12): 1447–54.

Gordon, Ian, and Ray Watson. 1996. “The Myth of Continuity-Corrected Sample Size Formulae.” Biometrics, 71–76.

Investigators, Capture et al. 1997. “Randomised Placebo-Controlled Trial of Abciximab Before and During Coronary Intervention in Refractory Unstable Angina: The CAPTURE Study.” Lancet 349: 1429–35.

Lachin, John M. 2009. Biostatistical Methods: The Assessment of Relative Risks. Vol. 509. John Wiley & Sons.

Binomial Trial Example

Overview

Scenario for consideration

Notation and Statistical Testing

Power Calculations

References