Efficacy and futility boundary update
Yujie Zhao and Keaven M. Anderson
Source:vignettes/articles/story-update-boundary.Rmd
story-update-boundary.Rmd
Design assumptions
We assume two analyses: an interim analysis (IA) and a final analysis (FA). The IA is planned 20 months after opening enrollment, followed by the FA at month 36. The planned enrollment period spans 14 months, with the first 2 months having an enrollment rate of 1/3 the final rate, the next 2 months with a rate of 2/3 of the final rate, and the final rate for the remaining 10 months. To obtain the targeted 90% power, these rates will be multiplied by a constant. The control arm is assumed to follow an exponential distribution with a median of 9 months and the dropout rate is 0.0001 per month regardless of treatment group. Finally, the experimental treatment group is piecewise exponential with a 3-month delayed treatment effect; that is, in the first 3 months HR = 1 and the HR is 0.6 thereafter.
alpha <- 0.0125
beta <- 0.1
ratio <- 1
# Enrollment
enroll_rate <- define_enroll_rate(
duration = c(2, 2, 10),
rate = (1:3) / 3
)
# Failure and dropout
fail_rate <- define_fail_rate(
duration = c(3, Inf),
fail_rate = log(2) / 9,
hr = c(1, 0.6),
dropout_rate = .0001
)
# IA and FA analysis time
analysis_time <- c(20, 36)
# Randomization ratio
ratio <- 1
We use the null hypothesis information for boundary crossing probability calculations under both the null and alternate hypotheses. This will also imply the null hypothesis information will be used for the information fraction used in spending functions to derive the design.
info_scale <- "h0_info"
One-sided design
For the design, we have efficacy bounds at both the IA and FA. We use the Lan and DeMets (1983) spending function with a total alpha of 0.0125, which approximates an O’Brien-Fleming bound.
upper <- gs_spending_bound
upar <- list(sf = gsDesign::sfLDOF, total_spend = alpha, param = NULL)
x <- gs_design_ahr(
enroll_rate = enroll_rate,
fail_rate = fail_rate,
alpha = alpha,
beta = beta,
info_frac = NULL,
info_scale = "h0_info",
analysis_time = analysis_time,
ratio = ratio,
upper = gs_spending_bound,
upar = upar,
test_upper = TRUE,
lower = gs_b,
lpar = rep(-Inf, 2),
test_lower = FALSE
) |> to_integer()
The planned design targets:
- Planned events: 227, 349
- Planned information fraction for interim and final analysis: 0.6504, 1
- Planned alpha spending: 0.0054, 0.025
- Planned efficacy bounds: 2.9048, 2.2593
We note that rounding up the final targeted events increases power slightly over the targeted 90%.
x |>
summary() |>
as_gt() |>
tab_header(title = "Planned design")
Planned design | |||||
Bound | Z | Nominal p^{1} | ~HR at bound^{2} | Cumulative boundary crossing probability | |
---|---|---|---|---|---|
Alternate hypothesis | Null hypothesis | ||||
Analysis: 1 Time: 19.9 N: 430 Event: 227 AHR: 0.73 Information fraction: 0.65 | |||||
Efficacy | 2.90 | 0.0018 | 0.6800 | 0.2877 | 0.0018 |
Analysis: 2 Time: 35.8 N: 430 Event: 349 AHR: 0.68 Information fraction: 1 | |||||
Efficacy | 2.26 | 0.0119 | 0.7852 | 0.9032 | 0.0125 |
^{1} One-sided p-value for experimental vs control treatment. Value < 0.5 favors experimental, > 0.5 favors control. | |||||
^{2} Approximate hazard ratio to cross bound. |
Bounds for alternate alpha
At the stage of study design, we may be required to report the designs under multiple \alpha if alpha is reallocated due to rejection of another hypothesis. At the design stage, the planned \alpha is 0.0125. Assume the updated \alpha is 0.025 due to reallocation of \alpha from some other hypothesis. The corresponding bounds are
gs_update_ahr(
x = x,
alpha = 0.025
) |>
summary(col_decimals = c(z = 4)) |>
as_gt(title = "Updated design",
subtitle = "For alternate alpha = 0.025")
Updated design | |||||
For alternate alpha = 0.025 | |||||
Bound | Z | Nominal p^{1} | ~HR at bound^{2} | Cumulative boundary crossing probability | |
---|---|---|---|---|---|
Alternate hypothesis | Null hypothesis | ||||
Analysis: 1 Time: 19.9 N: 430 Event: 227 AHR: 0.73 Information fraction: 0.65 | |||||
Efficacy | 2.5636 | 0.0052 | 0.7116 | 0.4133 | 0.0052 |
Analysis: 2 Time: 35.8 N: 430 Event: 349 AHR: 0.68 Information fraction: 1 | |||||
Efficacy | 1.9874 | 0.0234 | 0.8083 | 0.9421 | 0.0250 |
^{1} One-sided p-value for experimental vs control treatment. Value < 0.5 favors experimental, > 0.5 favors control. | |||||
^{2} Approximate hazard ratio to cross bound. |
The above updated boundaries utilize the planned treatment effect and
the planned statistical information under null hypothesis, considering
the original design has info_scale = "h0_info"
.
Updating bounds with observed events at time of analyses
We provide a simulation below where observed events the IA and FA differ from planned. In this case the differences from planned are due to using calendar-based cutoffs for the simulated data. In practice, even if attempting to match event counts exactly the observed events at analyses often differ from planned. We also assume the protocol specifies that the full \alpha will be spent at the final analysis even in a case like this when there is a shortfall of events versus the design plan.
The observed data for this example is generated by
simtrial::sim_pw_surv()
.
set.seed(123) # Make simulated data reproducible
# Generate trial data
observed_data <- simtrial::sim_pw_surv(
n = x$analysis$n[x$analysis$analysis == 2],
stratum = data.frame(stratum = "All", p = 1),
block = c(rep("control", 2), rep("experimental", 2)),
enroll_rate = x$enroll_rate,
fail_rate = (fail_rate |> simtrial::to_sim_pw_surv())$fail_rate,
dropout_rate = (fail_rate |> simtrial::to_sim_pw_surv())$dropout_rate
)
# Cut simulated data for interim analysis at planned calendar time
observed_data_ia <- observed_data |> simtrial::cut_data_by_date(analysis_time[1])
# Cut simulated data for final analysis at planned calendar time
observed_data_fa <- observed_data |> simtrial::cut_data_by_date(analysis_time[2])
The updated design is
# Set spending fraction for interim according to observed events
# divided by planned final events.
# Final spending fraction is 1 per plan even if there is a shortfall
# of events versus planned (as specified above)
ustime <- c(sum(observed_data_ia$event) / max(x$analysis$event), 1)
# Update bound
gs_update_ahr(
x = x,
ustime = ustime,
observed_data = list(observed_data_ia, observed_data_fa)
) |>
summary(col_decimals = c(z = 4)) |>
as_gt(title = "Updated design",
subtitle = paste0("With observed ", sum(observed_data_ia$event),
" events at IA and ", sum(observed_data_fa$event),
" events at FA"))
Updated design | |||||
With observed 241 events at IA and 353 events at FA | |||||
Bound | Z | Nominal p^{1} | ~HR at bound^{2} | Cumulative boundary crossing probability | |
---|---|---|---|---|---|
Alternate hypothesis | Null hypothesis | ||||
Analysis: 1 Time: 19.9 N: 430 Event: 241 AHR: 0.73 Information fraction: 0.69 | |||||
Efficacy | 2.7882 | 0.0026 | 0.6982 | 0.3558 | 0.0026 |
Analysis: 2 Time: 35.8 N: 430 Event: 353 AHR: 0.69 Information fraction: 1 | |||||
Efficacy | 2.2688 | 0.0116 | 0.7854 | 0.8949 | ^{3} 0.0125 |
^{1} One-sided p-value for experimental vs control treatment. Value < 0.5 favors experimental, > 0.5 favors control. | |||||
^{2} Approximate hazard ratio to cross bound. | |||||
^{3} Cumulative alpha for final analysis (0.0125) is less than the full alpha (0.025) when the futility bound is non-binding. The smaller value subtracts the probability of crossing a futility bound before crossing an efficacy bound at a later analysis (0.025 - 0.0125 = 0.0125) under the null hypothesis. |
Two-sided asymmetric design, beta-spending with non-binding lower bound
In this section, we investigate a 2 sided asymmetric design, with a non-binding \beta-spending used to generate futility bounds. \beta-spending refers to Type II error (1 - power) spending for the lower bound crossing probabilities under the alternative hypothesis. Non-binding bound computation assumes the trial continues if the lower bound is crossed for Type I error, but not Type II error.
In the original designs, we employ the Lan-DeMets spending function used to approximate O’Brien-Fleming bounds (Lan and DeMets 1983) for both efficacy and futility bounds. The total spending for efficacy is 0.0125, and for futility is 0.1. In addition, we assume there is no futility test for the final analysis.
# Upper and lower bounds uses spending with Lan-DeMets spending approximating
# O'Brien-Fleming bound
upper <- gs_spending_bound
upar <- list(sf = gsDesign::sfLDOF, total_spend = alpha, param = NULL)
lower <- gs_spending_bound
lpar <- list(sf = gsDesign::sfLDOF, total_spend = beta, param = NULL)
x <- gs_design_ahr(
enroll_rate = enroll_rate,
fail_rate = fail_rate,
alpha = alpha,
beta = beta,
info_frac = NULL,
info_scale = "h0_info",
analysis_time = c(20, 36),
ratio = ratio,
upper = gs_spending_bound,
upar = upar,
test_upper = TRUE,
lower = lower,
lpar = lpar,
test_lower = c(TRUE, FALSE),
binding = FALSE
) |> to_integer()
In the planned design, we have
- Planned events: 236, 363
- Planned information fraction (timing): 0.6501, 1
- Planned alpha spending: 0.0054388, 0.025
- Planned efficacy bounds: 2.9057, 2.2593
- Planned futility bounds: 0.6453
Since we added futility bounds, the sample size and number of events are larger than we had above in the 1-sided example.
x |>
summary() |>
as_gt() |>
tab_header(title = "Planned design",
subtitle = "2-sided asymmetric design, non-binding futility")
Planned design | |||||
2-sided asymmetric design, non-binding futility | |||||
Bound | Z | Nominal p^{1} | ~HR at bound^{2} | Cumulative boundary crossing probability | |
---|---|---|---|---|---|
Alternate hypothesis | Null hypothesis | ||||
Analysis: 1 Time: 19.9 N: 446 Event: 236 AHR: 0.73 Information fraction: 0.65 | |||||
Futility | 0.65 | 0.2594 | 0.9194 | 0.0402 | 0.7406 |
Efficacy | 2.91 | 0.0018 | 0.6850 | 0.3045 | 0.0018 |
Analysis: 2 Time: 36 N: 446 Event: 363 AHR: 0.68 Information fraction: 1 | |||||
Efficacy | 2.26 | 0.0119 | 0.7889 | 0.9035 | ^{3} 0.0124 |
^{1} One-sided p-value for experimental vs control treatment. Value < 0.5 favors experimental, > 0.5 favors control. | |||||
^{2} Approximate hazard ratio to cross bound. | |||||
^{3} Cumulative alpha for final analysis (0.0124) is less than the full alpha (0.0125) when the futility bound is non-binding. The smaller value subtracts the probability of crossing a futility bound before crossing an efficacy bound at a later analysis (0.0125 - 0.0001 = 0.0124) under the null hypothesis. |
Bounds for alternate alpha
We may want to report the design bounds under multiple \alpha in the case Type I error may be reallocated from another hypothesis. We assume now that \alpha is 0.025 but we still use the same sample size and event timing as for the original alpha = 0.0125. The updated bounds are
gs_update_ahr(
x = x,
alpha = 0.025
) |>
summary(col_decimals = c(z = 4)) |>
as_gt(title = "Updated design",
subtitle = "For alpha = 0.025")
Updated design | |||||
For alpha = 0.025 | |||||
Bound | Z | Nominal p^{1} | ~HR at bound^{2} | Cumulative boundary crossing probability | |
---|---|---|---|---|---|
Alternate hypothesis | Null hypothesis | ||||
Analysis: 1 Time: 19.9 N: 446 Event: 236 AHR: 0.73 Information fraction: 0.65 | |||||
Futility | 0.6453 | 0.2594 | 0.9194 | 0.0402 | 0.7406 |
Efficacy | 2.5644 | 0.0052 | 0.7162 | 0.4324 | 0.0052 |
Analysis: 2 Time: 36 N: 446 Event: 363 AHR: 0.68 Information fraction: 1 | |||||
Efficacy | 1.9873 | 0.0234 | 0.8117 | 0.9320 | ^{3} 0.0244 |
^{1} One-sided p-value for experimental vs control treatment. Value < 0.5 favors experimental, > 0.5 favors control. | |||||
^{2} Approximate hazard ratio to cross bound. | |||||
^{3} Cumulative alpha for final analysis (0.0244) is less than the full alpha (0.025) when the futility bound is non-binding. The smaller value subtracts the probability of crossing a futility bound before crossing an efficacy bound at a later analysis (0.025 - 0.0006 = 0.0244) under the null hypothesis. |
Updating bounds with observed events at time of analyses
We assume the observed events as for the 1-sided example above.
The updated design is
# Update spending fraction as above
ustime <- c(sum(observed_data_ia$event) / max(x$analysis$event), 1)
gs_update_ahr(
x = x,
ustime = ustime,
# Spending fraction for futility bound same as for efficacy
lstime = ustime,
observed_data = list(observed_data_ia, observed_data_fa)
) |>
summary(col_decimals = c(z = 4)) |>
as_gt(title = "Updated design",
subtitle = paste0("With observed ", sum(observed_data_ia$event),
" events at IA and ", sum(observed_data_fa$event),
" events at FA"))
Updated design | |||||
With observed 241 events at IA and 353 events at FA | |||||
Bound | Z | Nominal p^{1} | ~HR at bound^{2} | Cumulative boundary crossing probability | |
---|---|---|---|---|---|
Alternate hypothesis | Null hypothesis | ||||
Analysis: 1 Time: 19.9 N: 446 Event: 241 AHR: 0.73 Information fraction: 0.66 | |||||
Futility | 0.7073 | 0.2397 | 0.9129 | 0.0435 | 0.7603 |
Efficacy | 2.8518 | 0.0022 | 0.6925 | 0.3324 | 0.0022 |
Analysis: 2 Time: 36 N: 446 Event: 353 AHR: 0.69 Information fraction: 1 | |||||
Efficacy | 2.2614 | 0.0119 | 0.7861 | 0.8866 | ^{3} 0.0124 |
^{1} One-sided p-value for experimental vs control treatment. Value < 0.5 favors experimental, > 0.5 favors control. | |||||
^{2} Approximate hazard ratio to cross bound. | |||||
^{3} Cumulative alpha for final analysis (0.0124) is less than the full alpha (0.025) when the futility bound is non-binding. The smaller value subtracts the probability of crossing a futility bound before crossing an efficacy bound at a later analysis (0.025 - 0.0126 = 0.0124) under the null hypothesis. |