Skip to content

Commit

Permalink
Update: causal inference chapter 8 & 9
Browse files Browse the repository at this point in the history
  • Loading branch information
HuangFuSL committed Dec 7, 2023
1 parent 1854abb commit d07b9d5
Show file tree
Hide file tree
Showing 11 changed files with 281 additions and 0 deletions.
2 changes: 2 additions & 0 deletions docs/math/causal-inference/.pages
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,5 @@ nav:
- Interaction: chapter-5.md
- Graphical representation of causal effects: chapter-6.md
- Confounding: chapter-7.md
- Selection bias: chapter-8.md
- Measurement bias and "noncausal" diagrams: chapter-9.md
67 changes: 67 additions & 0 deletions docs/math/causal-inference/chapter-8.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Selection bias

Selection bias is caused by extra association caused by only part of the population is selected for analysis. Selection bias is caused by conditioning on a common effect of treatment and outcome, even if the treatment actually has no individual causal effect on the outcome.

On the causal graph, selection bias is caused by either condition on a collider or descendant of a collider. In reality, selection bias can both appear in observational studies or randomized experiments since participants may be removed from the study before the outcome is observed. If participants are removed **not in random**, selection bias is introduced.

{{ latex_image('imgs/8-condition-on-common-effect.tex') }}

$$
\frac{P(Y = 1 | A = 1)}{P(Y = 1 | A = 0)} = \frac{P(Y^{a = 1} = 1)}{P(Y^{a = 0} = 1)} \not = \frac{P(Y = 1 | A = 1, L = 0)}{P(Y = 0 | A = 1, L = 0)}
$$

!!! warning "Selection bias and hazard ratio"
**Hazards** is defined as the probability of a participant to die at a certain time. Following the definition of risk, the hazard ratio is the same as risk ratio. Consider the following causal graph.

{{ latex_image('imgs/8-hazard-ratio.tex') }}

In the graph, treatment $A$ denote the heart transplant. The outcome $Y_1$ and $Y_2$ denote the death of the patient. Unmeasured variable $U$ affect the overall death rate of the patient. For each time, we can define the associational hazard ratio as

$$
\begin{aligned}
aRR_{AY_1} &= \frac{P(Y_1 = 1 | A = 1)}{P(Y_1 = 1 | A = 0)} \\
aRR_{AY_2} &= \frac{P(Y_2 = 1 | A = 1)}{P(Y_2 = 1 | A = 0)}
\end{aligned}
$$

However, we can only measure the hazard ratio among the patients who are still alive at that time, that is:

$$
aRR_{AY_2 \mid Y_1 = 0} = \frac{P(Y_1 = 1 | A = 1, Y_1 = 0)}{P(Y_1 = 1 | A = 0, Y_1 = 0)}
$$

However, condition on $Y_1$ opens a trail $A \ra Y_1 \la U\ra Y_2$. Therefore, unless $U$ is measured, from the data collected we cannot distinguish the existence of the path $A \ra Y_2$.

## Selection without Bias

Selection will cause bias within the study, but in some cases such bias can be restricted to some strata of the study. Consider the following causal graph. $Y = 0$ if and only if $Y_A = Y_E = Y_O = 0$.

{{ latex_image('imgs/8-multiplicative-survival-model.tex') }}

1. $Y = 0$ is equivalent to $Y_A = 0$ and $Y_E = 0$. In such case, $A$ is independent of $E$.
2. Consider the case when $Y = 1$ and $Y_O = 0$, then $Y_A = 0$ indicates $Y_E = 1$ and vice versa. In such case, $A$ is dependent of $E$.

## Adjustment for Selection Bias

Assume that positivity holds for $C = 0$ and consistency holds for the analysis. Selection bias arises when the participants are not randomly removed from the study, causing the distribution of remaining participants to be different from the original population, i.e. the joint distribution $P'(A, L) = P(A, L | C = 0)$ is no longer identical to $P(A, L)$.

Selection bias is often unavoidable. IP weighting and standardization can be used to adjust for selection bias. The inverse probability weight $W^C$ is defined as

$$
W^C = \frac{1}{P(C = 0 | \cdot)}
$$

where $\cdot$ denote all the variable that directly affects $C$. Since we can only observed variables for uncensored ($C = 0$) individuals, the IP weight only uses $C = 0$. IP weighting assigns different weight to the probability distribution of each pair of $(A, L)$, so that the distribution of the weighted sample is identical to the original population.

$$
\begin{aligned}
&&& \frac{P(A = a, L = l, C = 0)}{P(C = 0 | A = a, L = l)} \\
&=&& \frac{P(A = a, L = l, C = 0)}{P(A = a, L = l, C = 0) / P(A = a, L = l)} \\
&=&& P(A = a, L = l)
\end{aligned}
$$

!!! warning "Difference in confounding bias and selection bias"
In confounding bias, IP weighting is applied on the treatment variable $A$, while in selection bias, IP weighting is applied on the censoring variable $C$.

When there are measured variable $L$ on the trail through $C$ that is able to block the trail causing selection bias, we can use stratification to adjust for selection bias by conditioning on $L$.
28 changes: 28 additions & 0 deletions docs/math/causal-inference/chapter-9.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Measurement bias and "noncausal" diagrams

## Measurement Bias

Measurement bias is caused by errors in measuring values of variables, i.e. $A^* \not = A$. **Measurement error** is defined as the difference between the measured value and the true value of a variable, $e_A = A^* - A$. Taken measurement error into consideration, the causal diagram is modified as follows:

{{ latex_image('imgs/9-measurement-bias.tex') }}

Measurement error follow two properties:

1. **Indepedence**: $e_A \perp e_Y$.
{{ latex_image('imgs/9-independence.tex') }}
2. **Nondifferentiality**: $e_A \perp Y$ and $e_Y \perp A$.
{{ latex_image('imgs/9-nondifferentiality.tex') }}

Lack of either property will bring extra association and lead to bias.

* Edge $Y\ra U_A$ will introduce **recall bias**.
* Edge $A\ra U_Y$ will introduce **reverse causation bias**.
* Edge $U_A\la U_{AY}\ra U_Y$ will introduce **independent measurement error**.

Correcting for measurement error usually requires additional validated non-biased samples.

## "Noncausal" Diagrams

A causal graph requires that *all* of the edges in the graph can be interpreted causally, together with well-defined intervention. For graphs with non-causal edges, adjustments might fail to remove bias, as the adjusted variable is not on the true causal path.

{{ latex_image('imgs/9-noncausal.tex') }}
37 changes: 37 additions & 0 deletions docs/math/causal-inference/imgs/8-condition-on-common-effect.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
\documentclass[standalone, version=2.0]{huangfusl-template}
\begin{document}
\begin{tabular}{cc}
\begin{tikzpicture}
\node[circle, draw, fill=gray!50!white] (L) at (0, 0) {$L$};
\node[circle, draw] (A) at (-1, -1.7) {$A$};
\node[circle, draw] (Y) at (1, -1.7) {$Y$};

\draw[stealth-, thick] (L) -- (A);
\draw[stealth-, thick] (L) -- (Y);
\draw[-stealth, thick] (A) -- (Y);

\draw[dashed, -stealth, thick, draw=red] ($ (A) + (0, -0.7) $) -- ($ (Y) + (0, -0.7) $);
\node (causal) at ($ (A)!0.5!(Y) + (0, -1.2) $) {\color{red}causation flow};
\draw[dashed, -stealth, thick, draw=blue] ($ (A) + (-0.7, 0) $) .. controls (-0.5, 1.4) and (0.5, 1.4) .. ($ (Y) + (0.7, 0) $);
\node (asso) at ($ (L) + (0, 1.2) $) {\color{blue}association flow};
\qquad
\end{tikzpicture} &
\qquad
\begin{tikzpicture}
\node[circle, draw] (L) at (0, 0) {$L$};
\node[circle, draw, fill=gray!50!white] (S) at (2, 0) {$S$};
\node[circle, draw] (A) at (-1, -1.7) {$A$};
\node[circle, draw] (Y) at (1, -1.7) {$Y$};

\draw[stealth-, thick] (L) -- (A);
\draw[stealth-, thick] (L) -- (Y);
\draw[-stealth, thick] (L) -- (S);
\draw[-stealth, thick] (A) -- (Y);

\draw[dashed, -stealth, thick, draw=red] ($ (A) + (0, -0.7) $) -- ($ (Y) + (0, -0.7) $);
\node (causal) at ($ (A)!0.5!(Y) + (0, -1.2) $) {\color{red}causation flow};
\draw[dashed, -stealth, thick, draw=blue] ($ (A) + (-0.7, 0) $) .. controls (-0.5, 1.4) and (0.5, 1.4) .. ($ (Y) + (0.7, 0) $);
\node (asso) at ($ (L) + (0, 1.2) $) {\color{blue}association flow};
\end{tikzpicture}
\end{tabular}
\end{document}
14 changes: 14 additions & 0 deletions docs/math/causal-inference/imgs/8-hazard-ratio.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
\documentclass[standalone, version=2.0]{huangfusl-template}
\begin{document}
\begin{tikzpicture}
\node[circle, draw, fill=gray!50!white] (Y1) at (2, 0) {$Y_1$};
\node[circle, draw] (A) at (0, 0) {$A$};
\node[circle, draw] (Y2) at (4, 0) {$Y_2$};
\node[circle, draw] (U) at (3, 1.7) {$U$};

\draw[-stealth, thick] (A) -- (Y1);
\draw[-stealth, thick] (Y1) -- (Y2);
\draw[-stealth, thick] (U) -- (Y1);
\draw[-stealth, thick] (U) -- (Y2);
\end{tikzpicture}
\end{document}
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
\documentclass[standalone, version=2.0]{huangfusl-template}
\begin{document}
\begin{tikzpicture}
\node[circle, draw] (A) at (-2, 0) {$A$};
\node[circle, draw] (E) at (-2, -2) {$E$};
\node[circle, draw, fill=gray!50!white] (YE) at (0, -2) {$Y_E$};
\node[circle, draw, fill=gray!50!white] (YA) at (0, 0) {$Y_A$};
\node[circle, draw, fill=gray!50!white] (YO) at (0, 2) {$Y_O$};
\node[circle, draw, fill=gray!50!white] (Y) at (2, 0) {$Y$};

\foreach \i in {A, E} {
\draw[-stealth, thick] (\i) -- (Y\i);
\draw[-stealth, thick] (Y\i) -- (Y);
}
\draw[-stealth, thick] (YO) -- (Y);
\end{tikzpicture}
\end{document}
36 changes: 36 additions & 0 deletions docs/math/causal-inference/imgs/9-independence.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
\documentclass[standalone, version=2.0]{huangfusl-template}
\begin{document}
\begin{tabular}{cc}
\begin{tikzpicture}
\node[draw, circle] (As) at (0, 0) {$A^*$};
\node[draw, circle] (Ys) at (3, 0) {$Y^*$};
\node[draw, circle] (A) at (0, -2) {$A$};
\node[draw, circle] (Y) at (3, -2) {$Y$};
\node[draw, circle] (UY) at (3, 2) {$U_Y$};
\node[draw, circle] (UA) at (0, 2) {$U_A$};

\draw[-stealth, thick] (A) -- (Y);
\foreach \i in {A, Y} {
\draw[-stealth, thick] (U\i) -- (\i s);
\draw[-stealth, thick] (\i) -- (\i s);
}
\end{tikzpicture} &
\begin{tikzpicture}
\node[draw, circle] (UAY) at (1.5, 3) {$U_{AY}$};
\node[draw, circle] (As) at (0, 0) {$A^*$};
\node[draw, circle] (Ys) at (3, 0) {$Y^*$};
\node[draw, circle] (A) at (0, -2) {$A$};
\node[draw, circle] (Y) at (3, -2) {$Y$};
\node[draw, circle] (UY) at (3, 2) {$U_Y$};
\node[draw, circle] (UA) at (0, 2) {$U_A$};

\draw[-stealth, thick] (A) -- (Y);
\foreach \i in {A, Y} {
\draw[-stealth, thick] (UAY) -- (U\i);
\draw[-stealth, thick] (U\i) -- (\i s);
\draw[-stealth, thick] (\i) -- (\i s);
}
\end{tikzpicture} \\
independence & without independence
\end{tabular}
\end{document}
17 changes: 17 additions & 0 deletions docs/math/causal-inference/imgs/9-measurement-bias.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
\documentclass[standalone, version=2.0]{huangfusl-template}
\begin{document}
\begin{tikzpicture}
\node[draw, circle] (As) at (0, 0) {$A^*$};
\node[draw, circle] (Ys) at (3, 0) {$Y^*$};
\node[draw, circle] (A) at (0, -2) {$A$};
\node[draw, circle] (Y) at (3, -2) {$Y$};
\node[draw, circle] (UY) at (3, 2) {$U_Y$};
\node[draw, circle] (UA) at (0, 2) {$U_A$};

\draw[-stealth, thick] (A) -- (Y);
\foreach \i in {A, Y} {
\draw[-stealth, thick] (U\i) -- (\i s);
\draw[-stealth, thick] (\i) -- (\i s);
}
\end{tikzpicture}
\end{document}
26 changes: 26 additions & 0 deletions docs/math/causal-inference/imgs/9-noncausal.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
\documentclass[standalone, version=2.0]{huangfusl-template}
\begin{document}
\begin{tabular}{cc}
\begin{tikzpicture}
\node[draw, circle] (A) at (-1, 0) {$A$};
\node[draw, circle] (Y) at (1, 0) {$Y$};
\node[draw, circle] (L) at (0, 1.7) {$L$};

\draw[-stealth, thick] (A) -- (L);
\draw[-stealth, thick] (A) -- (Y);
\draw[-stealth, thick] (L) -- (Y);
\end{tikzpicture} &
\begin{tikzpicture}
\node[draw, circle] (A) at (-1, 0) {$A$};
\node[draw, circle] (Y) at (1, 0) {$Y$};
\node[draw, circle] (U) at (0, 1.7) {$U$};
\node[draw, circle] (L) at (2, 1.7) {$L$};

\draw[-stealth, thick] (A) -- (U);
\draw[-stealth, thick] (A) -- (Y);
\draw[-stealth, thick] (U) -- (Y);
\draw[-stealth, thick] (U) -- (L);
\end{tikzpicture} \\
proposed graph & true graph
\end{tabular}
\end{document}
35 changes: 35 additions & 0 deletions docs/math/causal-inference/imgs/9-nondifferentiality.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
\documentclass[standalone, version=2.0]{huangfusl-template}
\begin{document}
\begin{tabular}{cc}
\begin{tikzpicture}
\node[draw, circle] (As) at (0, 0) {$A^*$};
\node[draw, circle] (Ys) at (3, 0) {$Y^*$};
\node[draw, circle] (A) at (0, -2) {$A$};
\node[draw, circle] (Y) at (3, -2) {$Y$};
\node[draw, circle] (UY) at (3, 2) {$U_Y$};
\node[draw, circle] (UA) at (0, 2) {$U_A$};

\draw[-stealth, thick] (A) -- (Y);
\foreach \i in {A, Y} {
\draw[-stealth, thick] (U\i) -- (\i s);
\draw[-stealth, thick] (\i) -- (\i s);
}
\end{tikzpicture} &
\begin{tikzpicture}
\node[draw, circle] (As) at (0, 0) {$A^*$};
\node[draw, circle] (Ys) at (3, 0) {$Y^*$};
\node[draw, circle] (A) at (0, -2) {$A$};
\node[draw, circle] (Y) at (3, -2) {$Y$};
\node[draw, circle] (UY) at (3, 2) {$U_Y$};
\node[draw, circle] (UA) at (0, 2) {$U_A$};

\draw[-stealth, thick] (A) -- (Y);
\draw[-stealth, thick] (Y) -- (UA);
\foreach \i in {A, Y} {
\draw[-stealth, thick] (U\i) -- (\i s);
\draw[-stealth, thick] (\i) -- (\i s);
}
\end{tikzpicture} \\
nondifferentiality & without nondifferentiality
\end{tabular}
\end{document}
2 changes: 2 additions & 0 deletions docs/math/causal-inference/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,5 @@ According to *Causal Inference: What If*.
5. [Interaction](chapter-5.md)
6. [Graphical representation of causal effects](chapter-6.md)
7. [Confounding](chapter-7.md)
8. [Selection bias](chapter-8.md)
9. [Measurement bias and "noncausal" diagrams](chapter-9.md)

0 comments on commit d07b9d5

Please sign in to comment.