Computing p-values for Fleming-Harring weighted logrank tests and the MaxCombo test
Source:vignettes/pMaxComboVignette.Rmd
pMaxComboVignette.Rmd
Introduction
This vignette demonstrates use of a simple routine to do simulations and testing using Fleming-Harrington weighted logrank tests and the MaxCombo test. In addition, we demonstrate how to perform these tests with a dataset not generated by simulation routines within the package. Note that all p-values computed here are one-sided with small values indicating that the experimental treatment is favored.
Defining the test
The MaxCombo test has been posed as the maximum of multiple Fleming-Harrington weighted logrank tests (Harrington and Fleming (1982), Fleming and Harrington (2011)). Combination tests looking at a maximum of selected tests in this class have also been proposed; see Lee (2007), Roychoudhury et al. (2021), and Lin et al. (2020). The Fleming-Harrington class is indexed by the parameters \(\rho\ge 0\) and \(\gamma\ge 0\). We will denote these as FH(\(\rho,\gamma\)). This class includes the logrank test as FH(0,0). Other tests of interest here include:
- FH(0,1): a test that down-weights early events
- FH(1,0): a test that down-weights late events
- FH(1,1): a test that down-weights events increasingly as their quantiles differ from the median
Executing for a single dataset
Generating test statistics with simfix()
We begin with a single trial simulation generated by the routine
simfix()
using default arguments for that routine.
simfix()
produces one record per test and data cutoff
method per simulation. Here we choose 3 tests (logrank=FH(0,0), FH(0,1)
and FH(1,1)). When more than one test is chosen the correlation between
tests is computed as shown by Karrison
(2016), in this case in the columns V1, V2, V3
. The
columns rho, gamma
indicate \(\rho\) and \(\gamma\) used to compute the test.
Z
is the FH(\(\rho,\gamma\)) normal test statistic with
variance 1 with a negative value favoring experimental treatment. The
variable cut
indicates how the data were cut for analysis,
in this case at the maximum of the targeted minimum follow-up after last
enrollment and the date at which the targeted event count was reached.
Sim
is a sequential index of the simulations performed.
x <- simfix(nsim = 1, timingType = 5, rg = tibble::tibble(rho = c(0, 0, 1), gamma = c(0, 1, 1)))
#> Warning: The `x` argument of `as_tibble.matrix()` must have unique column names if
#> `.name_repair` is omitted as of tibble 2.0.0.
#> ℹ Using compatibility `.name_repair`.
#> ℹ The deprecated feature was likely used in the simtrial package.
#> Please report the issue at <https://github.com/Merck/simtrial/issues>.
x %>% kable(digits = 2)
Events | lnhr | rho | gamma | Z | V1 | V2 | V3 | cut | Duration | Sim |
---|---|---|---|---|---|---|---|---|---|---|
360 | -0.41 | 0 | 0 | -3.81 | 1.00 | 0.86 | 0.93 | Max(min follow-up, event cut) | 75.83 | 1 |
360 | -0.41 | 0 | 1 | -4.33 | 0.86 | 1.00 | 0.94 | Max(min follow-up, event cut) | 75.83 | 1 |
360 | -0.41 | 1 | 1 | -4.08 | 0.93 | 0.94 | 1.00 | Max(min follow-up, event cut) | 75.83 | 1 |
Once you have this format, the MaxCombo p-value per Karrison (2016), Roychoudhury et al. (2021) can be computed as follows (note that you will need to have the package mvtnorm installed):
pMaxCombo(x)
#> [1] 7.360481e-06
Generating data with simtrial::simPWSurv()
We begin with another simulation generated by
simtrial::simPWSurv()
. Again, we use defaults for that
routine.
Stratum | enrollTime | Treatment | failTime | dropoutTime | cte | fail |
---|---|---|---|---|---|---|
All | 0.37 | Control | 23.06 | 14.65 | 15.02 | 0 |
All | 0.72 | Control | 7.18 | 665.16 | 7.91 | 1 |
All | 0.94 | Experimental | 6.75 | 217.78 | 7.69 | 1 |
All | 0.94 | Experimental | 21.83 | 183.76 | 22.77 | 1 |
All | 0.98 | Control | 10.62 | 400.73 | 11.61 | 1 |
All | 1.01 | Experimental | 64.43 | 1446.44 | 65.43 | 1 |
Once generated, we need to cut the data for analysis. Here we cut after 75 events.
x <- s %>% cutDataAtCount(75)
head(x) %>% kable(digits = 2)
tte | event | Stratum | Treatment |
---|---|---|---|
14.65 | 0 | All | Control |
7.18 | 1 | All | Control |
6.75 | 1 | All | Experimental |
21.83 | 1 | All | Experimental |
10.62 | 1 | All | Control |
31.07 | 0 | All | Experimental |
Now we can analyze this data. We begin with s
to show
how this can be done in a single line. In this case, we use the 4 test
combination suggested in Lin et al.
(2020), Roychoudhury et al.
(2021).
Z <- s %>%
cutDataAtCount(75) %>%
counting_process(txval = "Experimental") %>%
tenFHcorr(rg = tibble(rho = c(0, 0, 1, 1), gamma = c(0, 1, 0, 1)))
Z %>% kable(digits = 2)
rho | gamma | Z | V1 | V2 | V3 | V4 |
---|---|---|---|---|---|---|
0 | 0 | -0.36 | 1.00 | 0.86 | 0.94 | 0.93 |
0 | 1 | -1.81 | 0.86 | 1.00 | 0.65 | 0.95 |
1 | 0 | 0.63 | 0.94 | 0.65 | 1.00 | 0.79 |
1 | 1 | -1.40 | 0.93 | 0.95 | 0.79 | 1.00 |
Now we compute our p-value as before:
pMaxCombo(Z)
#> [1] 0.06472182
Suppose we want the p-value just based on the logrank and FH(0,1) and
FH(1,0) as suggested by Lee (2007). We
remove the rows and columns associated with FH(0,0) and FH(1,1) and then
apply pMaxCombo()
.
Using survival data in another format
For a trial not generated by simfix()
, the process is
slightly more involved. We consider survival data not in the
simtrial format and show the transformation needed. In
this case we use the small aml
dataset from the
survival package.
time | status | x |
---|---|---|
9 | 1 | Maintained |
13 | 1 | Maintained |
13 | 0 | Maintained |
18 | 1 | Maintained |
23 | 1 | Maintained |
28 | 0 | Maintained |
We rename variables and create a stratum variable as follows:
x <- aml %>% transmute(tte = time, event = status, Stratum = "All", Treatment = as.character(x))
head(x) %>% kable()
tte | event | Stratum | Treatment |
---|---|---|---|
9 | 1 | All | Maintained |
13 | 1 | All | Maintained |
13 | 0 | All | Maintained |
18 | 1 | All | Maintained |
23 | 1 | All | Maintained |
28 | 0 | All | Maintained |
Now we analyze the data with a MaxCombo with the logrank and FH(0,1) and compute a p-value.
Simulation
We now consider the example simulation from the
pMaxCombo()
help file to demonstrate how to simulate power
for the MaxCombo test. However, we increase the number of simulations to
100 in this case; a larger number should be used (e.g., 1000) for a
better estimate of design properties. Here we will test at the \(\alpha=0.001\) level.
# Only use cut events + min follow-up
xx <- simfix(nsim = 100, timingType = 5, rg = tibble(rho = c(0, 0, 1), gamma = c(0, 1, 1)))
# MaxCombo power estimate for cutoff at max of targeted events, minimum follow-up
p <- unlist(xx %>% group_by(Sim) %>% group_map(pMaxCombo))
mean(p < .001)
#> [1] 0.81
We note the use of group_map
in the above produces a
list of p-values for each simulation. It would be nice to have something
that worked more like dplyr::summarize()
to avoid
unlist()
and to allow evaluating, say, multiple data cutoff
methods. The latter can be done without having to re-run all simulations
as follows, demonstrated with a smaller number of simulations.
# Only use cuts for events and events + min follow-up
xx <- simfix(nsim = 100, timingType = c(2, 5), rg = tibble(rho = 0, gamma = c(0, 1)))
head(xx) %>% kable(digits = 2)
Events | lnhr | rho | gamma | Z | V1 | V2 | cut | Duration | Sim |
---|---|---|---|---|---|---|---|---|---|
350 | -0.31 | 0 | 0 | -2.93 | 1.00 | 0.86 | Targeted events | 70.44 | 1 |
350 | -0.31 | 0 | 1 | -2.11 | 0.86 | 1.00 | Targeted events | 70.44 | 1 |
350 | -0.31 | 0 | 0 | -2.93 | 1.00 | 0.86 | Max(min follow-up, event cut) | 70.44 | 1 |
350 | -0.31 | 0 | 1 | -2.11 | 0.86 | 1.00 | Max(min follow-up, event cut) | 70.44 | 1 |
350 | -0.42 | 0 | 0 | -3.90 | 1.00 | 0.85 | Targeted events | 66.59 | 2 |
350 | -0.42 | 0 | 1 | -5.07 | 0.85 | 1.00 | Targeted events | 66.59 | 2 |
Now we compute a p-value separately for each cut type, first for targeted event count.
# Subset to targeted events cutoff tests
p <- unlist(xx %>% filter(cut == "Targeted events") %>% group_by(Sim) %>% group_map(pMaxCombo))
mean(p < .025)
#> [1] 0.97
Now we use the later of targeted events and minimum follow-up cutoffs.