vignettes/articles/story-npe-background.Rmd

---
title: "Non-proportional effect size in group sequential design"
author: "Keaven M. Anderson"
output:
  rmarkdown::html_document:
    toc: true
    toc_float: true
    toc_depth: 2
    number_sections: true
    highlight: "textmate"
    css: "custom.css"
bibliography: "gsDesign2.bib"
vignette: >
  %\VignetteIndexEntry{Non-proportional effect size in group sequential design}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
---

```{r, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r, message=FALSE, warning=FALSE, echo=FALSE}
library(tibble)
library(dplyr)
library(knitr)
library(gsDesign2)
```

# Overview

The acronym NPES is short for non-proportional effect size.
While it is motivated primarily by a use for when designing a time-to-event trial under non-proportional hazards (NPH), we have simplified and generalized the concept here. The model is likely to be useful for rank-based survival tests beyond the logrank test that will be considered initially by @tsiatis1982repeated.
It could also be useful in other situations where treatment effect may vary over time in a trial for some reason.
We generalize the framework of Chapter 2 of @proschan2006statistical to incorporate the possibility of the treatment effect changing during the course of a trial in some systematic way.
This vignettes addresses distribution theory and initial technical issues around computing

- boundary crossing probabilities
- bounds satisfying targeted boundary crossing probabilities

This is then applied to generalize computational algorithms provided in Chapter 19 of @jennison1999group that are used to compute boundary crossing probabilities as well as boundaries for group sequential designs.
Additional specifics around boundary computation, power and sample size are provided in a separate vignette.

# The probability model

## The continuous model and E-process

We consider a simple example here to motivate distribution theory that is quite general and applies across many situations.
For instance, @proschan2006statistical immediately suggest paired observations, time-to-event and binary outcomes as endpoints where the theory is applicable.

We assume for a given integer $N>0$ that $X_{i}$ are independent, $i=1,2,\ldots$.
For some integer $K\leq N$ we assume we will perform analysis $K$ times after $0<n_1<n_2,\ldots ,n_K = N$ observations are available for analysis.
Note that we have not confined $n\leq N$, but $N$ can be considered the final planned sample size.
@proschan2006statistical refer to the estimation or E-process which we extend here to

$$\hat{\theta}_k = \frac{\sum_{i=1}^{n_k} X_{i}}{n_k}\equiv \bar X_{k}.$$
While @proschan2006statistical has used $\delta$ instead of $\theta$ in our notation, we stick more closely to the notation of @jennison1999group where $\theta$ is used.
For our example, we see $\hat{\theta}_k\equiv\bar X_k$ represents the sample average at analysis $k$, $1\leq k\leq K$.
With a survival endpoint, $\hat\theta_k$ would typically represent a Cox model coefficient representing the logarithm of the hazard ratio for experimental vs control treatment and $n_k$ would represent the planned number of events at analysis $k$, $1\leq k\leq K.$
Denoting $t_k=n_k/N$, we assume that for some real-valued function $\theta(t)$ for $t \geq 0$ we have for $1\leq k\leq K$

$$E(\hat{\theta}_k) =\theta(t_k) =E(\bar X_k).$$
In the models of @proschan2006statistical and @jennison1999group we would have $\theta(t)$ equal to some constant $\theta$.
We assume further that for $i=1,2,\ldots$
$$\text{Var}(X_{i})=1.$$
The sample average variance under this assumption is for $1\leq k\leq K$

$$\text{Var}(\hat\theta(t_k))=\text{Var}(\bar X_k) =  1/ n_k.$$
The statistical information for the estimate $\hat\theta(t_k)$ for $1\leq k\leq K$ for this case is
$$ \mathcal{I}_k \equiv \frac{1}{\text{Var}(\hat\theta(t_k))} = n_k.$$
We now see that $t_k$, $1\leq k\leq K$ is the so-called information fraction at analysis $k$ in that
$t_k=\mathcal{I}_k/\mathcal{I}_K.$

## Z-process

The Z-process is commonly used (e.g., @jennison1999group) and will be used below to extend the computational algorithm in Chapter 19 of @jennison1999group by defining equivalently in the first and second lines below for $k=1,\ldots,K$

$$Z_{k} = \frac{\hat\theta_k}{\sqrt{\text{Var}(\hat\theta_k)}}= \sqrt{\mathcal{I}_k}\hat\theta_k= \sqrt{n_k}\bar X_k.$$

The variance for $1\leq k\leq K$ is
$$\text{Var}(Z_k) = 1$$
and the expected value is

$$E(Z_{k})= \sqrt{\mathcal{I}_k}\theta(t_{k})= \sqrt{n_k}E(\bar X_k) .$$

## B-process

B-values are mnemonic for Brownian motion.
For $1\leq k\leq K$ we define
$$B_{k}=\sqrt{t_k}Z_k$$
which implies
$$ E(B_{k}) = \sqrt{t_{k}\mathcal{I}_k}\theta(t_k) = t_k \sqrt{\mathcal{I}_K} \theta(t_k) = \mathcal{I}_k\theta(t_k)/\sqrt{\mathcal{I}_K}$$
and
$$\text{Var}(B_k) = t_k.$$

For our example, we have

$$B_k=\frac{1}{\sqrt N}\sum_{i=1}^{n_k}X_i.$$
It can be useful to think of $B_k$ as a sum of independent random variables.

## Summary of E-, Z- and B-processes

```{r, message=FALSE, warning=FALSE, echo=FALSE}
mt <- tibble::tribble(
  ~Statistic, ~Example, ~"Expected value", ~Variance,
  "$\\hat\\theta_k$", "$\\bar X_k$", "$\\theta(t_k)$", "$\\mathcal{I}_k^{-1}$",
  "$Z_k=\\sqrt{\\mathcal{I}_k}\\hat\\theta_k$",
  "$\\sqrt{n_k}\\bar X_k$", "$\\sqrt{\\mathcal{I}_k}\\theta(t_k)$",
  "$1$",
  "$B_k=\\sqrt{t_k}Z_k$", "$\\sum_{i=1}^{n_k}X_i/\\sqrt N$",
  "$t_k\\sqrt{\\mathcal{I}_K}\\
  \\theta(t_k)=\\mathcal{I}_k\\
  \\theta(t_k)/\\sqrt{\\mathcal{I}_K}$",
  "$t_k$"
)
mt %>% kable(escape = FALSE)
```

## Conditional independence, covariance and canonical form

We assume independent increments in the B-process.
That is, for $1\leq j < k\leq K$
$$\tag{1} B_k - B_j \sim \text{Normal} (\sqrt{\mathcal{I}_K}(t_k\theta(t_k)- t_j\theta(t_j)), t_k-t_j)$$
independent of $B_1,\ldots,B_j$.
As noted above, for a given $1\leq k\leq K$ we have for our example
$$B_j=\sum_{i=1}^{n_j}X_i / \sqrt N.$$
Because of independence of the sequence $X_i$, $i=1,2,\ldots$, we  immediately have for $1\leq j\leq k\leq K$
$$\text{Cov}(B_j,B_k) = \text{Var}(B_j) = t_j.$$
This leads further to
$$\text{Corr}(B_j,B_k)=\frac{t_j}{\sqrt{t_jt_k}}=\sqrt{t_j/t_k}=\text{Corr}(Z_j,Z_k)=\text{Cov}(Z_j,Z_k)$$
which is the covariance structure in the so-called *canonical form* of @jennison1999group.
For our example, we have
$$B_k=\frac{1}{\sqrt N}\sum_{i=1}^{n_k}X_i$$
and
$$B_k-B_j=\frac{1}{\sqrt N}\sum_{i=n_j + 1}^{n_k}X_i$$
and the covariance is obvious.
We assume independent increments in the B-process that will be demonstrated for the simple example above.
That is, for $1\leq j < k\leq K$
$$\tag{1} B_k - B_j \sim \text{Normal} (\mathcal{I}_k\theta(t_k)- \mathcal{I}_j\theta(t_j), t_k-t_j)$$
independent of $B_1,\ldots,B_j$.
For a given $1\leq j\leq k\leq K$ we have for our example
$$B_j=\sum_{i=1}^{n_j}X_i / (\sqrt N\sigma).$$
Because of independence of the sequence $X_i$, $i=1,2,\ldots$, we  immediately have for $1\leq j\leq k\leq K$
$$\text{Cov}(B_j,B_k) = \text{Var}(B_j) = t_j/t_k =\mathcal{I}_j/\mathcal{I}_k.$$
This leads to
$$\mathcal{I}_j/\mathcal{I}_k=\sqrt{t_j/t_k}=\text{Corr}(B_j,B_k)=\text{Corr}(Z_j,Z_k)=\text{Cov}(Z_j,Z_k)$$
which is the covariance structure in the so-called *canonical form* of @jennison1999group.
The independence of $B_j$ and
$$B_k-B_j=\sum_{i=n_j + 1}^{n_k}X_i/(\sqrt N\sigma)$$
is obvious for this example.

# Test bounds and crossing probabilities

In this section we define notation for bounds and boundary crossing probabilities for a group sequential design.
We also define an algorithm for computing bounds based on a targeted boundary crossing probability at each analysis.
The notation will be used elsewhere for defining one- and two-sided group sequential hypothesis testing.
A value of $\theta(t)>0$ will reflect a positive benefit.

For $k=1,2,\ldots,K-1$, interim cutoffs $-\infty \leq a_k< b_k\leq \infty$ are set; final cutoffs $-\infty \leq a_K\leq b_K <\infty$ are also set.
An infinite efficacy bound at an analysis means that bound cannot be crossed at that analysis.
Thus, $3K$ parameters define a group sequential design: $a_k$, $b_k$, and $\mathcal{I}_k$, $k=1,2,\ldots,K$.

### Notation for boundary crossing probabilities

We now apply the above distributional assumptions to compute boundary crossing probabilities.
We use a shorthand notation in this section to have $\theta$ represent $\theta()$ and $\theta=0$ to represent $\theta(t)\equiv 0$ for all $t$.
We denote the probability of crossing the upper boundary at analysis $k$ without previously crossing a bound by

$$\alpha_{k}(\theta)=P_{\theta}(\{Z_{k}\geq b_{k}\}\cap_{j=1}^{i-1}\{a_{j}\leq Z_{j}< b_{j}\}),$$
$k=1,2,\ldots,K.$

Next, we consider analogous notation for the lower bound. For $k=1,2,\ldots,K$
denote the probability of crossing a lower bound at analysis $k$ without previously crossing any bound by

$$\beta_{k}(\theta)=P_{\theta}((Z_{k}< a_{k}\}\cap_{j=1}^{k-1}\{ a_{j}\leq Z_{j}< b_{j}\}).$$
For symmetric testing for analysis $k$ we would have $a_k= - b_k$, $\beta_k(0)=\alpha_k(0),$ $k=1,2,\ldots,K$.
The total lower boundary crossing probability for a trial is denoted by
$$\beta(\theta)\equiv\sum_{k=1}^{K}\beta_{k}(\theta).$$
Note that we can also set $a_k= -\infty$ for any or all analyses if a lower bound is not desired, $k=1,2,\ldots,K$.
For $k<K$, we can set $b_k=\infty$ where an upper bound is not desired.
Obviously, for each $k$, we want either $a_k>-\infty$ or $b_k<\infty$.

# References