Skip to contents

The weighted parametric group sequential design (WPGSD) (Anderson et al. (2022)) approach allows one to take advantage of the known correlation structure in constructing efficacy bounds to control family-wise error rate (FWER) for a group sequential design. Here correlation may be due to common observations in nested populations, due to common observations in overlapping populations, or due to common observations in the control arm.

Methodologies to calculate correlations

Suppose that in a group sequential trial there are mm elementary null hypotheses HiH_i, iI=1,...,mi \in I={1,...,m}, and there are KK analyses. Let kk be the index for the interim analyses and final analyses, k=1,2,...Kk=1,2,...K. For any nonempty set JIJ \subseteq I, we denote the intersection hypothesis HJ=jJHjH_J=\cap_{j \in J}H_j. We note that HIH_I is the global null hypothesis.

We assume the plan is for all hypotheses to be tested at each of the kk planned analyses if the trial continues to the end for all hypotheses. We further assume that the distribution of the m×Km \times K tests of mm individual hypotheses at all kk analyses is multivariate normal with a completely known correlation matrix.

Let ZikZ_{ik} be the standardized normal test statistic for hypothesis iIi \in I, analysis 1kK1 \le k \le K. Let nikn_{ik} be the number of events collected cumulatively through stage kk for hypothesis ii. Then nii,kkn_{i \wedge i',k \wedge k'} is the number of events included in both ZikZ_{ik} and ii, iIi' \in I, 1k1 \le k, kKk' \le K. The key of the parametric tests to utilize the correlation among the test statistics. The correlation between ZikZ_{ik} and ZikZ_{i'k'} is Corr(Zik,Zik)=nii,kknik*nikCorr(Z_{ik},Z_{i'k'})=\frac{n_{i \wedge i',k \wedge k'}}{\sqrt{n_{ik}*n_{i'k'}}}.

Examples

We borrow an example from a paper by Anderson et al. (Anderson et al. (2022)), demonstrated in Section 2 - Motivating Examples, we use Example 1 as the basis here. The setting will be:

In a two-arm controlled clinical trial with one primary endpoint, there are three patient populations defined by the status of two biomarkers, A and B:

  • Biomarker A positive, the population 1,
  • Biomarker B positive, the population 2,
  • Overall population.

The 3 primary elementary hypotheses are:

  • H1: the experimental treatment is superior to the control in the population 1
  • H2: the experimental treatment is superior to the control in the population 2
  • H3: the experimental treatment is superior to the control in the overall population

Assume an interim analysis and a final analysis are planned for the study. The number of events are listed as

event_tb <- tribble(
  ~Population, ~"Number of Event in IA", ~"Number of Event in FA",
  "Population 1", 100, 200,
  "Population 2", 110, 220,
  "Overlap of Population 1 and 2", 80, 160,
  "Overall Population", 225, 450
)
event_tb %>%
  gt() %>%
  tab_header(title = "Number of events at each population")
Number of events at each population
Population Number of Event in IA Number of Event in FA
Population 1 100 200
Population 2 110 220
Overlap of Population 1 and 2 80 160
Overall Population 225 450

Correlation of different populations within the same analysis

Let’s consider a simple situation, we want to compare the population 1 and population 2 in only interim analyses. Then k=1k=1, and to compare H1H_{1} and H2H_{2}, the ii will be i=1i=1 and i=2i=2. The correlation matrix will be Corr(Z11,Z21)=n12,11n11*n21Corr(Z_{11},Z_{21})=\frac{n_{1 \wedge 2,1 \wedge 1}}{\sqrt{n_{11}*n_{21}}} The number of events are listed as

event_tbl <- tribble(
  ~Population, ~"Number of Event in IA",
  "Population 1", 100,
  "Population 2", 110,
  "Overlap in population 1 and 2", 80
)
event_tbl %>%
  gt() %>%
  tab_header(title = "Number of events at each population in example 1")
Number of events at each population in example 1
Population Number of Event in IA
Population 1 100
Population 2 110
Overlap in population 1 and 2 80

The the corrleation could be simply calculated as Corr(Z11,Z21)=80100*110=0.76Corr(Z_{11},Z_{21})=\frac{80}{\sqrt{100*110}}=0.76

Corr1 <- 80 / sqrt(100 * 110)
round(Corr1, 2)
## [1] 0.76

Correlation of different analyses within the same population

Let’s consider another simple situation, we want to compare single population, for example, the population 1, but in different analyses, interim and final analyses. Then i=1i=1, and to compare IA and FA, the kk will be k=1k=1 and k=2k=2. The correlation matrix will be Corr(Z11,Z12)=n11,12n11*n12Corr(Z_{11},Z_{12})=\frac{n_{1 \wedge 1,1 \wedge 2}}{\sqrt{n_{11}*n_{12}}} The number of events are listed as

event_tb2 <- tribble(
  ~Population, ~"Number of Event in IA", ~"Number of Event in FA",
  "Population 1", 100, 200
)
event_tb2 %>%
  gt() %>%
  tab_header(title = "Number of events at each analyses in example 2")
Number of events at each analyses in example 2
Population Number of Event in IA Number of Event in FA
Population 1 100 200

The the corrleation could be simply calculated as Corr(Z11,Z12)=100100*200=0.71\text{Corr}(Z_{11},Z_{12})=\frac{100}{\sqrt{100*200}}=0.71 The 100 in the numerator is the overlap number of events of interim analysis and final analysis in population 1.

Corr1 <- 100 / sqrt(100 * 200)
round(Corr1, 2)
## [1] 0.71

Correlation of different analyses and different population

Let’s consider the situation that we want to compare population 1 in interim analyses and population 2 in final analyses. Then for different population, i=1i=1 and i=2i=2, and to compare IA and FA, the kk will be k=1k=1 and k=2k=2. The correlation matrix will be Corr(Z11,Z22)=n11,22n11*n22\text{Corr}(Z_{11},Z_{22})=\frac{n_{1 \wedge 1,2 \wedge 2}}{\sqrt{n_{11}*n_{22}}} The number of events are listed as

event_tb3 <- tribble(
  ~Population, ~"Number of Event in IA", ~"Number of Event in FA",
  "Population 1", 100, 200,
  "Population 2", 110, 220,
  "Overlap in population 1 and 2", 80, 160
)
event_tb3 %>%
  gt() %>%
  tab_header(title = "Number of events at each population & analyses in example 3")
Number of events at each population & analyses in example 3
Population Number of Event in IA Number of Event in FA
Population 1 100 200
Population 2 110 220
Overlap in population 1 and 2 80 160

The correlation could be simply calculated as Corr(Z11,Z22)=80100*220=0.54\text{Corr}(Z_{11},Z_{22})=\frac{80}{\sqrt{100*220}}=0.54 The 80 in the numerator is the overlap number of events of population 1 in interim analysis and population 2 in final analysis.

Corr1 <- 80 / sqrt(100 * 220)
round(Corr1, 2)
## [1] 0.54

Generate the correlation matrix by generate_corr()

Now we know how to calculate the correlation values under different situations, and the generate_corr() function was built based on this logic. We can directly calculate the results for each cross situation via the function.

First, we need a event table including the information of the study.

  • H1 refers to one hypothesis, selected depending on the interest, while H2 refers to the other hypothesis, both of which are listed for multiplicity testing. For example, H1 means the experimental treatment is superior to the control in the population 1/experimental arm 1; H2 means the experimental treatment is superior to the control in the population 2/experimental arm 2;
  • Analysis means different analysis stages, for example, 1 means the interim analysis, and 2 means the final analysis;
  • Event is the common events overlap by H1 and H2.

For example: H1=1, H2=1, Analysis=1, Event=100indicates that in the first population, there are 100 cases where the experimental treatment is superior to the control in the interim analysis.

Another example: H1=1, H2=2, Analysis=2, Event=160 indicates that the number of overlapping cases where the experimental treatment is superior to the control in population 1 and 2 in the final analysis is 160.

To be noticed, the column names in this function are fixed to be H1, H2, Analysis, Event.

library(wpgsd)
# The event table
event <- tibble::tribble(
  ~H1, ~H2, ~Analysis, ~Event,
  1, 1, 1, 100,
  2, 2, 1, 110,
  3, 3, 1, 225,
  1, 2, 1, 80,
  1, 3, 1, 100,
  2, 3, 1, 110,
  1, 1, 2, 200,
  2, 2, 2, 220,
  3, 3, 2, 450,
  1, 2, 2, 160,
  1, 3, 2, 200,
  2, 3, 2, 220
)

event %>%
  gt() %>%
  tab_header(title = "Number of events at each population & analyses")
Number of events at each population & analyses
H1 H2 Analysis Event
1 1 1 100
2 2 1 110
3 3 1 225
1 2 1 80
1 3 1 100
2 3 1 110
1 1 2 200
2 2 2 220
3 3 2 450
1 2 2 160
1 3 2 200
2 3 2 220

Then we input the above event table to the function of generate_corr(), and get the correlation matrix as follow.

##          H1_A1     H2_A1     H3_A1     H1_A2     H2_A2     H3_A2
## [1,] 1.0000000 0.7627701 0.6666667 0.7071068 0.5393599 0.4714045
## [2,] 0.7627701 1.0000000 0.6992059 0.5393599 0.7071068 0.4944132
## [3,] 0.6666667 0.6992059 1.0000000 0.4714045 0.4944132 0.7071068
## [4,] 0.7071068 0.5393599 0.4714045 1.0000000 0.7627701 0.6666667
## [5,] 0.5393599 0.7071068 0.4944132 0.7627701 1.0000000 0.6992059
## [6,] 0.4714045 0.4944132 0.7071068 0.6666667 0.6992059 1.0000000

References

Anderson, Keaven M, Zifang Guo, Jing Zhao, and Linda Z Sun. 2022. “A Unified Framework for Weighted Parametric Group Sequential Design.” Biometrical Journal 64 (7): 1219–39.