-
Notifications
You must be signed in to change notification settings - Fork 8
/
Copy pathstory-npe-background.Rmd
186 lines (151 loc) · 9.32 KB
/
story-npe-background.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
---
title: "Non-proportional effect size in group sequential design"
author: "Keaven M. Anderson"
output:
rmarkdown::html_document:
toc: true
toc_float: true
toc_depth: 2
number_sections: true
highlight: "textmate"
css: "custom.css"
bibliography: "gsDesign2.bib"
vignette: >
%\VignetteIndexEntry{Non-proportional effect size in group sequential design}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
---
```{r, include=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r, message=FALSE, warning=FALSE, echo=FALSE}
library(tibble)
library(dplyr)
library(knitr)
library(gsDesign2)
```
# Overview
The acronym NPES is short for non-proportional effect size.
While it is motivated primarily by a use for when designing a time-to-event trial under non-proportional hazards (NPH), we have simplified and generalized the concept here. The model is likely to be useful for rank-based survival tests beyond the logrank test that will be considered initially by @tsiatis1982repeated.
It could also be useful in other situations where treatment effect may vary over time in a trial for some reason.
We generalize the framework of Chapter 2 of @proschan2006statistical to incorporate the possibility of the treatment effect changing during the course of a trial in some systematic way.
This vignettes addresses distribution theory and initial technical issues around computing
- boundary crossing probabilities
- bounds satisfying targeted boundary crossing probabilities
This is then applied to generalize computational algorithms provided in Chapter 19 of @jennison1999group that are used to compute boundary crossing probabilities as well as boundaries for group sequential designs.
Additional specifics around boundary computation, power and sample size are provided in a separate vignette.
# The probability model
## The continuous model and E-process
We consider a simple example here to motivate distribution theory that is quite general and applies across many situations.
For instance, @proschan2006statistical immediately suggest paired observations, time-to-event and binary outcomes as endpoints where the theory is applicable.
We assume for a given integer $N>0$ that $X_{i}$ are independent, $i=1,2,\ldots$.
For some integer $K\leq N$ we assume we will perform analysis $K$ times after $0<n_1<n_2,\ldots ,n_K = N$ observations are available for analysis.
Note that we have not confined $n\leq N$, but $N$ can be considered the final planned sample size.
@proschan2006statistical refer to the estimation or E-process which we extend here to
$$\hat{\theta}_k = \frac{\sum_{i=1}^{n_k} X_{i}}{n_k}\equiv \bar X_{k}.$$
While @proschan2006statistical has used $\delta$ instead of $\theta$ in our notation, we stick more closely to the notation of @jennison1999group where $\theta$ is used.
For our example, we see $\hat{\theta}_k\equiv\bar X_k$ represents the sample average at analysis $k$, $1\leq k\leq K$.
With a survival endpoint, $\hat\theta_k$ would typically represent a Cox model coefficient representing the logarithm of the hazard ratio for experimental vs control treatment and $n_k$ would represent the planned number of events at analysis $k$, $1\leq k\leq K.$
Denoting $t_k=n_k/N$, we assume that for some real-valued function $\theta(t)$ for $t \geq 0$ we have for $1\leq k\leq K$
$$E(\hat{\theta}_k) =\theta(t_k) =E(\bar X_k).$$
In the models of @proschan2006statistical and @jennison1999group we would have $\theta(t)$ equal to some constant $\theta$.
We assume further that for $i=1,2,\ldots$
$$\text{Var}(X_{i})=1.$$
The sample average variance under this assumption is for $1\leq k\leq K$
$$\text{Var}(\hat\theta(t_k))=\text{Var}(\bar X_k) = 1/ n_k.$$
The statistical information for the estimate $\hat\theta(t_k)$ for $1\leq k\leq K$ for this case is
$$ \mathcal{I}_k \equiv \frac{1}{\text{Var}(\hat\theta(t_k))} = n_k.$$
We now see that $t_k$, $1\leq k\leq K$ is the so-called information fraction at analysis $k$ in that
$t_k=\mathcal{I}_k/\mathcal{I}_K.$
## Z-process
The Z-process is commonly used (e.g., @jennison1999group) and will be used below to extend the computational algorithm in Chapter 19 of @jennison1999group by defining equivalently in the first and second lines below for $k=1,\ldots,K$
$$Z_{k} = \frac{\hat\theta_k}{\sqrt{\text{Var}(\hat\theta_k)}}= \sqrt{\mathcal{I}_k}\hat\theta_k= \sqrt{n_k}\bar X_k.$$
The variance for $1\leq k\leq K$ is
$$\text{Var}(Z_k) = 1$$
and the expected value is
$$E(Z_{k})= \sqrt{\mathcal{I}_k}\theta(t_{k})= \sqrt{n_k}E(\bar X_k) .$$
## B-process
B-values are mnemonic for Brownian motion.
For $1\leq k\leq K$ we define
$$B_{k}=\sqrt{t_k}Z_k$$
which implies
$$ E(B_{k}) = \sqrt{t_{k}\mathcal{I}_k}\theta(t_k) = t_k \sqrt{\mathcal{I}_K} \theta(t_k) = \mathcal{I}_k\theta(t_k)/\sqrt{\mathcal{I}_K}$$
and
$$\text{Var}(B_k) = t_k.$$
For our example, we have
$$B_k=\frac{1}{\sqrt N}\sum_{i=1}^{n_k}X_i.$$
It can be useful to think of $B_k$ as a sum of independent random variables.
## Summary of E-, Z- and B-processes
```{r, message=FALSE, warning=FALSE, echo=FALSE}
mt <- tibble::tribble(
~Statistic, ~Example, ~"Expected value", ~Variance,
"$\\hat\\theta_k$", "$\\bar X_k$", "$\\theta(t_k)$", "$\\mathcal{I}_k^{-1}$",
"$Z_k=\\sqrt{\\mathcal{I}_k}\\hat\\theta_k$",
"$\\sqrt{n_k}\\bar X_k$", "$\\sqrt{\\mathcal{I}_k}\\theta(t_k)$",
"$1$",
"$B_k=\\sqrt{t_k}Z_k$", "$\\sum_{i=1}^{n_k}X_i/\\sqrt N$",
"$t_k\\sqrt{\\mathcal{I}_K}\\
\\theta(t_k)=\\mathcal{I}_k\\
\\theta(t_k)/\\sqrt{\\mathcal{I}_K}$",
"$t_k$"
)
mt %>% kable(escape = FALSE)
```
## Conditional independence, covariance and canonical form
We assume independent increments in the B-process.
That is, for $1\leq j < k\leq K$
$$\tag{1} B_k - B_j \sim \text{Normal} (\sqrt{\mathcal{I}_K}(t_k\theta(t_k)- t_j\theta(t_j)), t_k-t_j)$$
independent of $B_1,\ldots,B_j$.
As noted above, for a given $1\leq k\leq K$ we have for our example
$$B_j=\sum_{i=1}^{n_j}X_i / \sqrt N.$$
Because of independence of the sequence $X_i$, $i=1,2,\ldots$, we immediately have for $1\leq j\leq k\leq K$
$$\text{Cov}(B_j,B_k) = \text{Var}(B_j) = t_j.$$
This leads further to
$$\text{Corr}(B_j,B_k)=\frac{t_j}{\sqrt{t_jt_k}}=\sqrt{t_j/t_k}=\text{Corr}(Z_j,Z_k)=\text{Cov}(Z_j,Z_k)$$
which is the covariance structure in the so-called *canonical form* of @jennison1999group.
For our example, we have
$$B_k=\frac{1}{\sqrt N}\sum_{i=1}^{n_k}X_i$$
and
$$B_k-B_j=\frac{1}{\sqrt N}\sum_{i=n_j + 1}^{n_k}X_i$$
and the covariance is obvious.
We assume independent increments in the B-process that will be demonstrated for the simple example above.
That is, for $1\leq j < k\leq K$
$$\tag{1} B_k - B_j \sim \text{Normal} (\mathcal{I}_k\theta(t_k)- \mathcal{I}_j\theta(t_j), t_k-t_j)$$
independent of $B_1,\ldots,B_j$.
For a given $1\leq j\leq k\leq K$ we have for our example
$$B_j=\sum_{i=1}^{n_j}X_i / (\sqrt N\sigma).$$
Because of independence of the sequence $X_i$, $i=1,2,\ldots$, we immediately have for $1\leq j\leq k\leq K$
$$\text{Cov}(B_j,B_k) = \text{Var}(B_j) = t_j/t_k =\mathcal{I}_j/\mathcal{I}_k.$$
This leads to
$$\mathcal{I}_j/\mathcal{I}_k=\sqrt{t_j/t_k}=\text{Corr}(B_j,B_k)=\text{Corr}(Z_j,Z_k)=\text{Cov}(Z_j,Z_k)$$
which is the covariance structure in the so-called *canonical form* of @jennison1999group.
The independence of $B_j$ and
$$B_k-B_j=\sum_{i=n_j + 1}^{n_k}X_i/(\sqrt N\sigma)$$
is obvious for this example.
# Test bounds and crossing probabilities
In this section we define notation for bounds and boundary crossing probabilities for a group sequential design.
We also define an algorithm for computing bounds based on a targeted boundary crossing probability at each analysis.
The notation will be used elsewhere for defining one- and two-sided group sequential hypothesis testing.
A value of $\theta(t)>0$ will reflect a positive benefit.
For $k=1,2,\ldots,K-1$, interim cutoffs $-\infty \leq a_k< b_k\leq \infty$ are set; final cutoffs $-\infty \leq a_K\leq b_K <\infty$ are also set.
An infinite efficacy bound at an analysis means that bound cannot be crossed at that analysis.
Thus, $3K$ parameters define a group sequential design: $a_k$, $b_k$, and $\mathcal{I}_k$, $k=1,2,\ldots,K$.
### Notation for boundary crossing probabilities
We now apply the above distributional assumptions to compute boundary crossing probabilities.
We use a shorthand notation in this section to have $\theta$ represent $\theta()$ and $\theta=0$ to represent $\theta(t)\equiv 0$ for all $t$.
We denote the probability of crossing the upper boundary at analysis $k$ without previously crossing a bound by
$$\alpha_{k}(\theta)=P_{\theta}(\{Z_{k}\geq b_{k}\}\cap_{j=1}^{i-1}\{a_{j}\leq Z_{j}< b_{j}\}),$$
$k=1,2,\ldots,K.$
Next, we consider analogous notation for the lower bound. For $k=1,2,\ldots,K$
denote the probability of crossing a lower bound at analysis $k$ without previously crossing any bound by
$$\beta_{k}(\theta)=P_{\theta}((Z_{k}< a_{k}\}\cap_{j=1}^{k-1}\{ a_{j}\leq Z_{j}< b_{j}\}).$$
For symmetric testing for analysis $k$ we would have $a_k= - b_k$, $\beta_k(0)=\alpha_k(0),$ $k=1,2,\ldots,K$.
The total lower boundary crossing probability for a trial is denoted by
$$\beta(\theta)\equiv\sum_{k=1}^{K}\beta_{k}(\theta).$$
Note that we can also set $a_k= -\infty$ for any or all analyses if a lower bound is not desired, $k=1,2,\ldots,K$.
For $k<K$, we can set $b_k=\infty$ where an upper bound is not desired.
Obviously, for each $k$, we want either $a_k>-\infty$ or $b_k<\infty$.
# References