Non-proportional effect size in group sequential design
Keaven M. Anderson
Source:vignettes/articles/story-npe-background.Rmd
story-npe-background.Rmd
Overview
The acronym NPES is short for non-proportional effect size. While it is motivated primarily by a use for when designing a time-to-event trial under non-proportional hazards (NPH), we have simplified and generalized the concept here. The model is likely to be useful for rank-based survival tests beyond the logrank test that will be considered initially by Tsiatis (1982). It could also be useful in other situations where treatment effect may vary over time in a trial for some reason. We generalize the framework of Chapter 2 of Proschan, Lan, and Wittes (2006) to incorporate the possibility of the treatment effect changing during the course of a trial in some systematic way. This vignettes addresses distribution theory and initial technical issues around computing
- boundary crossing probabilities
- bounds satisfying targeted boundary crossing probabilities
This is then applied to generalize computational algorithms provided in Chapter 19 of Jennison and Turnbull (1999) that are used to compute boundary crossing probabilities as well as boundaries for group sequential designs. Additional specifics around boundary computation, power and sample size are provided in a separate vignette.
The probability model
The continuous model and E-process
We consider a simple example here to motivate distribution theory that is quite general and applies across many situations. For instance, Proschan, Lan, and Wittes (2006) immediately suggest paired observations, time-to-event and binary outcomes as endpoints where the theory is applicable.
We assume for a given integer N>0 that X_{i} are independent, i=1,2,\ldots. For some integer K\leq N we assume we will perform analysis K times after 0<n_1<n_2,\ldots ,n_K = N observations are available for analysis. Note that we have not confined n\leq N, but N can be considered the final planned sample size. Proschan, Lan, and Wittes (2006) refer to the estimation or E-process which we extend here to
\hat{\theta}_k = \frac{\sum_{i=1}^{n_k} X_{i}}{n_k}\equiv \bar X_{k}. While Proschan, Lan, and Wittes (2006) has used \delta instead of \theta in our notation, we stick more closely to the notation of Jennison and Turnbull (1999) where \theta is used. For our example, we see \hat{\theta}_k\equiv\bar X_k represents the sample average at analysis k, 1\leq k\leq K. With a survival endpoint, \hat\theta_k would typically represent a Cox model coefficient representing the logarithm of the hazard ratio for experimental vs control treatment and n_k would represent the planned number of events at analysis k, 1\leq k\leq K. Denoting t_k=n_k/N, we assume that for some real-valued function \theta(t) for t \geq 0 we have for 1\leq k\leq K
E(\hat{\theta}_k) =\theta(t_k) =E(\bar X_k). In the models of Proschan, Lan, and Wittes (2006) and Jennison and Turnbull (1999) we would have \theta(t) equal to some constant \theta. We assume further that for i=1,2,\ldots \text{Var}(X_{i})=1. The sample average variance under this assumption is for 1\leq k\leq K
\text{Var}(\hat\theta(t_k))=\text{Var}(\bar X_k) = 1/ n_k. The statistical information for the estimate \hat\theta(t_k) for 1\leq k\leq K for this case is \mathcal{I}_k \equiv \frac{1}{\text{Var}(\hat\theta(t_k))} = n_k. We now see that t_k, 1\leq k\leq K is the so-called information fraction at analysis k in that t_k=\mathcal{I}_k/\mathcal{I}_K.
Z-process
The Z-process is commonly used (e.g., Jennison and Turnbull (1999)) and will be used below to extend the computational algorithm in Chapter 19 of Jennison and Turnbull (1999) by defining equivalently in the first and second lines below for k=1,\ldots,K
Z_{k} = \frac{\hat\theta_k}{\sqrt{\text{Var}(\hat\theta_k)}}= \sqrt{\mathcal{I}_k}\hat\theta_k= \sqrt{n_k}\bar X_k.
The variance for 1\leq k\leq K is \text{Var}(Z_k) = 1 and the expected value is
E(Z_{k})= \sqrt{\mathcal{I}_k}\theta(t_{k})= \sqrt{n_k}E(\bar X_k) .
B-process
B-values are mnemonic for Brownian motion. For 1\leq k\leq K we define B_{k}=\sqrt{t_k}Z_k which implies E(B_{k}) = \sqrt{t_{k}\mathcal{I}_k}\theta(t_k) = t_k \sqrt{\mathcal{I}_K} \theta(t_k) = \mathcal{I}_k\theta(t_k)/\sqrt{\mathcal{I}_K} and \text{Var}(B_k) = t_k.
For our example, we have
B_k=\frac{1}{\sqrt N}\sum_{i=1}^{n_k}X_i. It can be useful to think of B_k as a sum of independent random variables.
Summary of E-, Z- and B-processes
Statistic | Example | Expected value | Variance |
---|---|---|---|
\hat\theta_k | \bar X_k | \theta(t_k) | \mathcal{I}_k^{-1} |
Z_k=\sqrt{\mathcal{I}_k}\hat\theta_k | \sqrt{n_k}\bar X_k | \sqrt{\mathcal{I}_k}\theta(t_k) | 1 |
B_k=\sqrt{t_k}Z_k | \sum_{i=1}^{n_k}X_i/\sqrt N | t_k\sqrt{\mathcal{I}_K}\ heta(t_k)=\mathcal{I}_k\ heta(t_k)/\sqrt{\mathcal{I}_K} | t_k |
Conditional independence, covariance and canonical form
We assume independent increments in the B-process. That is, for 1\leq j < k\leq K \tag{1} B_k - B_j \sim \text{Normal} (\sqrt{\mathcal{I}_K}(t_k\theta(t_k)- t_j\theta(t_j)), t_k-t_j) independent of B_1,\ldots,B_j. As noted above, for a given 1\leq k\leq K we have for our example B_j=\sum_{i=1}^{n_j}X_i / \sqrt N. Because of independence of the sequence X_i, i=1,2,\ldots, we immediately have for 1\leq j\leq k\leq K \text{Cov}(B_j,B_k) = \text{Var}(B_j) = t_j. This leads further to \text{Corr}(B_j,B_k)=\frac{t_j}{\sqrt{t_jt_k}}=\sqrt{t_j/t_k}=\text{Corr}(Z_j,Z_k)=\text{Cov}(Z_j,Z_k) which is the covariance structure in the so-called canonical form of Jennison and Turnbull (1999). For our example, we have B_k=\frac{1}{\sqrt N}\sum_{i=1}^{n_k}X_i and B_k-B_j=\frac{1}{\sqrt N}\sum_{i=n_j + 1}^{n_k}X_i and the covariance is obvious. We assume independent increments in the B-process that will be demonstrated for the simple example above. That is, for 1\leq j < k\leq K \tag{1} B_k - B_j \sim \text{Normal} (\mathcal{I}_k\theta(t_k)- \mathcal{I}_j\theta(t_j), t_k-t_j) independent of B_1,\ldots,B_j. For a given 1\leq j\leq k\leq K we have for our example B_j=\sum_{i=1}^{n_j}X_i / (\sqrt N\sigma). Because of independence of the sequence X_i, i=1,2,\ldots, we immediately have for 1\leq j\leq k\leq K \text{Cov}(B_j,B_k) = \text{Var}(B_j) = t_j/t_k =\mathcal{I}_j/\mathcal{I}_k. This leads to \mathcal{I}_j/\mathcal{I}_k=\sqrt{t_j/t_k}=\text{Corr}(B_j,B_k)=\text{Corr}(Z_j,Z_k)=\text{Cov}(Z_j,Z_k) which is the covariance structure in the so-called canonical form of Jennison and Turnbull (1999). The independence of B_j and B_k-B_j=\sum_{i=n_j + 1}^{n_k}X_i/(\sqrt N\sigma) is obvious for this example.
Test bounds and crossing probabilities
In this section we define notation for bounds and boundary crossing probabilities for a group sequential design. We also define an algorithm for computing bounds based on a targeted boundary crossing probability at each analysis. The notation will be used elsewhere for defining one- and two-sided group sequential hypothesis testing. A value of \theta(t)>0 will reflect a positive benefit.
For k=1,2,\ldots,K-1, interim cutoffs -\infty \leq a_k< b_k\leq \infty are set; final cutoffs -\infty \leq a_K\leq b_K <\infty are also set. An infinite efficacy bound at an analysis means that bound cannot be crossed at that analysis. Thus, 3K parameters define a group sequential design: a_k, b_k, and \mathcal{I}_k, k=1,2,\ldots,K.
Notation for boundary crossing probabilities
We now apply the above distributional assumptions to compute boundary crossing probabilities. We use a shorthand notation in this section to have \theta represent \theta() and \theta=0 to represent \theta(t)\equiv 0 for all t. We denote the probability of crossing the upper boundary at analysis k without previously crossing a bound by
\alpha_{k}(\theta)=P_{\theta}(\{Z_{k}\geq b_{k}\}\cap_{j=1}^{i-1}\{a_{j}\leq Z_{j}< b_{j}\}), k=1,2,\ldots,K.
Next, we consider analogous notation for the lower bound. For k=1,2,\ldots,K denote the probability of crossing a lower bound at analysis k without previously crossing any bound by
\beta_{k}(\theta)=P_{\theta}((Z_{k}< a_{k}\}\cap_{j=1}^{k-1}\{ a_{j}\leq Z_{j}< b_{j}\}). For symmetric testing for analysis k we would have a_k= - b_k, \beta_k(0)=\alpha_k(0), k=1,2,\ldots,K. The total lower boundary crossing probability for a trial is denoted by \beta(\theta)\equiv\sum_{k=1}^{K}\beta_{k}(\theta). Note that we can also set a_k= -\infty for any or all analyses if a lower bound is not desired, k=1,2,\ldots,K. For k<K, we can set b_k=\infty where an upper bound is not desired. Obviously, for each k, we want either a_k>-\infty or b_k<\infty.