# Statistical information under null and alternative hypothesis

Source:`vignettes/articles/story-info-formula.Rmd`

`story-info-formula.Rmd`

In a group sequential design of the k-th analysis, the Z-score is Z_k = \delta_k / \sqrt{\text{Var}(\delta_k | H_0)} The statistical information \mathcal I_k is defined as the inverse of the variance of \delta_k, i.e., \mathcal I_k = 1 / \text{Var}(\delta_k).

## Continuous outcomes

Imagine a trial with a continuous outcome. Let X_{0, i} \sim N(\mu_0, \sigma^2) for subjects i = 1, \ldots, n_0 in the control arm and X_{1,i} \sim N(\mu_1, \sigma^2) for patient i (i = 1, \ldots, n_1) in the experimental arm.

For a superiority design, the tested hypothesis is H_0: \; \mu_0 = \mu_1 \;\;\; \text{vs.} \;\;\; H_1:\; \mu_1 > \mu_0. Suppose at the k-th analysis, there are n_{0k} subjects in the control arm, and there are n_{1k} subjects in the experimental arm. We have \delta_k as the difference of group means, i.e., \delta_k = \frac{\sum_{i=1}^{n_{1k}} X_{i,1}}{n_{1k}} - \frac{\sum_{i=1}^{n_{0k}} X_{i,0}}{n_{0k}}. It can be estimated as \widehat\delta_k = \frac{\sum_{i=1}^{n_{1k}} x_{i,1}}{n_{1k}} - \frac{\sum_{i=1}^{n_{0k}} x_{i,0}}{n_{0k}}, where x_{i,j} are the observation of X_{i,j} of subject i in arm j.

The statistical information \mathcal I_k is \mathcal I_k^{-1} = \text{Var}(\delta_k | H_0) = \sigma^2 (1 / n_{1k} + 1 / n_{0k}), under both H_0 and H_1, which can be estimated as \mathcal I_k = \widehat\sigma^2 (1 / n_{1k} + 1 / n_{0k}).

## Binary outcomes

Imagine a trial with a binary outcome. Let X_{0, i} \sim B(p_0) for patient i = 1, \ldots, n_0 and X_{1,i} \sim B(p_1) for patient i (i = 1, \ldots, n_1), where p_0 and p_1 are failure rate probability. Suppose at the k-th analysis, there are n_{0k} subjects in the control arm, and there are n_{1k} subjects in the experimental arm. For a superiority design, the null and alternative hypothesis is H_0: \; p_0 = p_1 = p \;\;\; \text{vs.} \;\;\; H_1:\; p_0 > p_1. The nature-scale treatment effect is \delta_k = \frac{\sum_{i=1}^{n_{1k}} X_{i,1}}{n_{1k}} - \frac{\sum_{i=1}^{n_{0k}} X_{i,0}}{n_{0k}}, It can be estimated as \widehat\delta_k = \frac{\sum_{i=1}^{n_{1k}} x_{i,1}}{n_{1k}} - \frac{\sum_{i=1}^{n_{0k}} x_{i,0}}{n_{0k}}, where x_{i,j} are the observation of X_{i,j} of subject i in arm j.

The statistical information is \mathcal I_k^{-1} = \text{Var}(\delta_k) = \left\{ \begin{array}{ll} p(1-p)/n_{1k} + p(1-p)/n_{0k} & \text{under } H_0\\ p_1(1-p_1)/n_{1k} + p_0(1-p_0)/n_{0k} & \text{under } H_1\\ \end{array} \right.. Its estimation is \widehat{\mathcal I}_k^{-1} = \left\{ \begin{array}{ll} \bar p(1 - \bar p) / n_{1k} + \bar p(1 - \bar p) / n_{0k} & \text{under } H_0\\ \widehat p_1(1-p_1) / n_{1k} + \widehat p_0(1 - \widehat p_0)/n_{0k} & \text{under } H_1\\ \end{array} \right., where \bar p = \frac{\sum_{i=1}^{n_{1k}}x_{i1} + \sum_{i=1}^{n_{0k}}x_{i0}}{n_{1k} + n_{0k}}, \widehat p_j = \frac{\sum_{i=1}^{n_{jk}}x_{ij}}{n_{jk}} for j = 0, 1.

## Survival outcome

In many clinical trials, the outcome is the time to some event. For simplicity, assume the event is death so that each person can only have one event; the same ideas apply for events that can recur, but in those cases we restrict attention to the first event for each patients. We use the logrank statistics to compare the treatment and control arms. If we assume there are N_k total number of deaths at analysis k. The numerator of the logrank statistics at analysis k is (Proschan, Lan, and Wittes 2006) \sum_{i=1}^{N_k} D_i, where D_i = O_i - E_i with O_i is the indicator that the ith death occurred in a treatment patient, and E_i = m_{1i} / (m_{0i} + m_{1i}) as the null expectation of O_i given the respective numbers, m_{0i} and m_{1i}, of control and treatment patients at risk just prior to the ith death.

Conditioned on m_{0i} and m_{1i}, the O_i has a Bernoulli distribution with parameter E_i. The null conditional mean and variance of D_i are 0 and V_i = E_i(1 − E_i), respectively.

Unconditionally, the D_i are uncorrelated, mean 0 random variables with variance E(V_i) under the null hypothesis.

Thus, conditioned on N_k, we have \begin{array}{ccl} \mathcal I_k^{-1} & = & \text{Var}(\delta_k) = \sum_{i=1}^{N_k} \text{Var}(D_i) = \sum_{i=1}^{N_k} E(V_i) = E \left( \sum_{i=1}^{N_k} V_i \right) = E \left( \sum_{i=1}^{N_k} E_i(1 − E_i) \right) \\ & = & \left\{ \begin{array}{ll} E\left(\sum_{i=1}^{N_k} \frac{r}{1+r} \frac{1}{1+r} \right) & \text{under } H_0\\ E\left(\sum_{i=1}^{N_k} \frac{m_{1i}}{(m_{0i} + m_{1i})} \frac{m_{0i}}{(m_{0i} + m_{1i})}\right) & \text{under } H_1 \end{array} \right., \end{array} where r is the randomization ratio. Its estimation is \begin{array}{ccl} \widehat{\mathcal I}_k^{-1} & = & \left\{ \begin{array}{ll} \sum_{i=1}^{N_k} \frac{r}{1+r} \frac{1}{1+r} & \text{under } H_0\\ \sum_{i=1}^{N_k} \frac{m_{1i}}{(m_{0i} + m_{1i})} \frac{m_{0i}}{(m_{0i} + m_{1i})} & \text{under } H_1 \end{array} \right.. \end{array}