Skip to content

Commit

Permalink
fixing...
Browse files Browse the repository at this point in the history
  • Loading branch information
WKaiZ committed Nov 1, 2024
1 parent 006a1b1 commit c37c202
Show file tree
Hide file tree
Showing 8 changed files with 65 additions and 98 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/generate-pdf.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ jobs:
"rl/eae.md"
"rl/summary.md"
"bayes-nets/index.md"
"bayes-nets/probability.md"
"bayes-nets/inference.md"
"bayes-nets/representation.md"
"bayes-nets/structure.md"
Expand Down Expand Up @@ -150,6 +151,7 @@ jobs:
"pdf_output/rl_eae.pdf" \
"pdf_output/rl_summary.pdf" \
"pdf_output/bayes-nets_index.pdf" \
"pdf_output/bayes-nets_probability.pdf" \
"pdf_output/bayes-nets_inference.pdf" \
"pdf_output/bayes-nets_representation.pdf" \
"pdf_output/bayes-nets_structure.pdf" \
Expand Down
2 changes: 2 additions & 0 deletions bayes-nets/approximate.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ title: "6.7 Approximate Inference in Bayes Nets: Sampling"
parent: 6. Bayes Nets
nav_order: 7
layout: page
header-includes:
\pagenumbering{gobble}
---

# 6.7 Approximate Inference in Bayes Nets: Sampling
Expand Down
71 changes: 26 additions & 45 deletions bayes-nets/d-separation.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ title: "6.5 D-Separation"
parent: 6. Bayes Nets
nav_order: 5
layout: page
header-includes:
\pagenumbering{gobble}
---

# 6.4 D-Separation
Expand All @@ -22,44 +24,38 @@ We will present all three canonical cases of connected three-node two-edge Bayes
*Figure 2: Causal Chain with Y observed.*

Figure 1 is a configuration of three nodes known as a **causal chain**. It expresses the following representation of the joint distribution over $$X$$, $$Y$$, and $$Z$$:
$$
P(x, y, z) = P(z|y)P(y|x)P(x)
$$

$$P(x, y, z) = P(z|y)P(y|x)P(x)$$

It's important to note that $$X$$ and $$Z$$ are not guaranteed to be independent, as shown by the following counterexample:

$$
P(y|x) =
$$P(y|x) =
\begin{cases}
1 & \text{if } x = y \\
0 & \text{else }
\end{cases}
$$
\end{cases}$$

$$
P(z|y) =
$$P(z|y) =
\begin{cases}
1 & \text{if } z = y \\
0 & \text{else }
\end{cases}
$$
\end{cases}$$

<p>
</p>
In this case, $$P(z|x) = 1$$ if $$x = z$$ and $$0$$ otherwise, so $$X$$ and $$Z$$ are not independent.

However, we can make the statement that $$X \perp\!\!\!\perp Z | Y$$, as in Figure 2. Recall that this conditional independence means:
$$
P(X | Z, Y) = P(X | Y)
$$

$$P(X | Z, Y) = P(X | Y)$$

We can prove this statement as follows:

$$
P(X | Z, y) = \frac{P(X, Z, y)}{P(Z, y)}
$$P(X | Z, y) = \frac{P(X, Z, y)}{P(Z, y)}
= \frac{P(Z|y) P(y|X) P(X)}{\sum_{x} P(X, y, Z)}
= \frac{P(Z|y) P(y|X) P(X)}{P(Z|y) \sum_{x} P(y|x)P(x)}
= \frac{P(y|X) P(X)}{\sum_{x} P(y|x)P(x)}
= P(X|y)
$$
= P(X|y)$$

<p>
</p>
Expand All @@ -77,26 +73,21 @@ An analogous proof can be used to show the same thing for the case where $$X$$ h

Another possible configuration for a triple is the **common cause**. It expresses the following representation:

$$
P(x, y, z) = P(x|y)P(z|y)P(y)
$$
$$P(x, y, z) = P(x|y)P(z|y)P(y)$$

Just like with the causal chain, we can show that $$X$$ is not guaranteed to be independent of $$Z$$ with the following counterexample distribution:

$$
P(x|y) =
$$P(x|y) =
\begin{cases}
1 & \text{if } x = y \\
0 & \text{else }
\end{cases}
$$
$$
P(z|y) =
\end{cases}$$

$$P(z|y) =
\begin{cases}
1 & \text{if } z = y \\
0 & \text{else }
\end{cases}
$$
\end{cases}$$

<p>
</p>
Expand All @@ -105,9 +96,7 @@ Then $$P(x|z) = 1$$ if $$x = z$$ and $$0$$ otherwise, so $$X$$ and $$Z$$ are not
</p>
But it is true that $$X \perp\!\!\!\perp Z | Y$$. That is, $$X$$ and $$Z$$ are independent if $$Y$$ is observed as in Figure 4. We can show this as follows:

$$
P(X | Z, y) = \frac{P(X, Z, y)}{P(Z, y)} = \frac{P(X|y) P(Z|y) P(y)}{P(Z|y) P(y)} = P(X|y)
$$
$$P(X | Z, y) = \frac{P(X, Z, y)}{P(Z, y)} = \frac{P(X|y) P(Z|y) P(y)}{P(Z|y) P(y)} = P(X|y)$$

## 6.4.3 Common Effect

Expand All @@ -121,30 +110,22 @@ $$

It expresses the representation:

$$
P(x, y, z) = P(y|x,z)P(x)P(z)
$$
$$P(x, y, z) = P(y|x,z)P(x)P(z)$$

In the configuration shown in Figure 5, $$X$$ and $$Z$$ are independent: $$X \perp\!\!\!\perp Z$$. However, they are not necessarily independent when conditioned on $$Y$$ (Figure 6). As an example, suppose all three are binary variables. $$X$$ and $$Z$$ are true and false with equal probability:

$$
P(X=true) = P(X=false) = 0.5
$$
$$P(X=true) = P(X=false) = 0.5$$

$$
P(Z=true) = P(Z=false) = 0.5
$$
$$P(Z=true) = P(Z=false) = 0.5$$

and $$Y$$ is determined by whether $$X$$ and $$Z$$ have the same value:

$$
P(Y | X, Z) =
$$P(Y | X, Z) =
\begin{cases}
1 & \text{if } X = Z \text{ and } Y = true \\
1 & \text{if } X \ne Z \text{ and } Y = false \\
0 & \text{else}
\end{cases}
$$
\end{cases}$$

Then $$X$$ and $$Z$$ are independent if $$Y$$ is unobserved. But if $$Y$$ is observed, then knowing $$X$$ will tell us the value of $$Z$$, and vice-versa. So $$X$$ and $$Z$$ are *not* conditionally independent given $$Y$$.
<p>
Expand Down
18 changes: 6 additions & 12 deletions bayes-nets/elimination.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ title: '6.6 Exact Inference in Bayes Nets'
parent: 6. Bayes Nets
nav_order: 6
layout: page
header-includes:
\pagenumbering{gobble}
---

# 6.6 Exact Inference in Bayes Nets
Expand Down Expand Up @@ -54,31 +56,23 @@ Alternatively, we can write $$P(C, +e | T, S)$$, even if this is not guaranteed

This approach to writing factors is grounded in repeated applications of the chain rule. In the example above, we know that we can't have a variable on both sides of the conditional bar. Also, we know:

$$
P(T, C, S, +e) = P(T) P(S | T) P(C | T) P(+e | C, S) = P(S, T) P(C | T) P(+e | C, S)
$$
$$P(T, C, S, +e) = P(T) P(S | T) P(C | T) P(+e | C, S) = P(S, T) P(C | T) P(+e | C, S)$$

and so:

$$
P(C | T) P(+e | C, S) = \frac{P(T, C, S, +e)}{P(S, T)} = P(C, +e | T, S)
$$
$$P(C | T) P(+e | C, S) = \frac{P(T, C, S, +e)}{P(S, T)} = P(C, +e | T, S)$$

While the variable elimination process is more involved conceptually, the maximum size of any factor generated is only 8 rows instead of 16, as it would be if we formed the entire joint PDF.

<p>
</p>
An alternate way of looking at the problem is to observe that the calculation of $$P(T|+e)$$ can either be done through inference by enumeration as follows:

$$
\alpha \sum_s{\sum_c{P(T)P(s|T)P(c|T)P(+e|c,s)}}
$$
$$\alpha \sum_s{\sum_c{P(T)P(s|T)P(c|T)P(+e|c,s)}}$$

or by variable elimination as follows:

$$
\alpha P(T)\sum_s{P(s|T)\sum_c{P(c|T)P(+e|c,s)}}
$$
$$\alpha P(T)\sum_s{P(s|T)\sum_c{P(c|T)P(+e|c,s)}}$$

We can see that the equations are equivalent, except that in variable elimination we have moved terms that are irrelevant to the summations outside of each summation!

Expand Down
2 changes: 2 additions & 0 deletions bayes-nets/inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ title: '6.2 Probability Inference'
parent: 6. Bayes Nets
nav_order: 2
layout: page
header-includes:
\pagenumbering{gobble}
---

# 6.2 Probabilistic Inference
Expand Down
44 changes: 18 additions & 26 deletions bayes-nets/probability.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ title: "6.1 Probability Rundown"
parent: 6. Bayes Nets
nav_order: 1
layout: page
header-includes:
\pagenumbering{gobble}
---

# 6.1 Probability Rundown
Expand All @@ -11,57 +13,47 @@ We're assuming that you've learned the foundations of probability in CS70, so th

A **random variable** represents an event whose outcome is unknown. A **probability distribution** is an assignment of weights to outcomes. Probability distributions must satisfy the following conditions:

$$
0 \leq P(\omega) \leq 1
$$
$$
\sum_{\omega}P(\omega) = 1
$$
$$0 \leq P(\omega) \leq 1$$

For instance, if $$ A $$ is a binary variable (can only take on two values), then $$ P(A = 0) = p $$ and $$ P(A = 1) = 1 - p $$ for some $$ p \in [0,1] $$.
$$\sum_{\omega}P(\omega) = 1$$

For instance, if $$A$$ is a binary variable (can only take on two values), then $$P(A = 0) = p$$ and $$P(A = 1) = 1 - p$$ for some $$p \in [0,1]$$.

We will use the convention that capital letters refer to random variables and lowercase letters refer to some specific outcome of that random variable.

We use the notation $$ P(A, B, C) $$ to denote the **joint distribution** of the variables $$ A, B, C $$. In joint distributions, ordering does not matter, i.e., $$ P(A, B, C) = P(C, B, A) $$.
We use the notation $$P(A, B, C)$$ to denote the **joint distribution** of the variables $$A, B, C$$. In joint distributions, ordering does not matter, i.e., $$P(A, B, C) = P(C, B, A)$$.

We can expand a joint distribution using the **chain rule**, also sometimes referred to as the product rule.

$$
P(A, B) = P(A | B) P(B) = P(B | A) P(A)
$$
$$
P(A_1, A_2, \dots, A_k) = P(A_1) P(A_2 | A_1) \dots P(A_k | A_1, \dots, A_{k-1})
$$
$$P(A, B) = P(A | B) P(B) = P(B | A) P(A)$$

$$P(A_1, A_2, \dots, A_k) = P(A_1) P(A_2 | A_1) \dots P(A_k | A_1, \dots, A_{k-1})$$

The **marginal distribution** of $$ A, B $$ can be obtained by summing out all possible values that variable $$ C $$ can take as $$ P(A, B) = \sum_{c}P(A, B, C = c) $$. The marginal distribution of $$ A $$ can also be obtained as $$ P(A) = \sum_{b} \sum_{c}P(A, B = b, C = c) $$. We will also sometimes refer to the process of marginalization as "summing out."
The **marginal distribution** of $$A, B$$ can be obtained by summing out all possible values that variable $$C$$ can take as $$P(A, B) = \sum_{c}P(A, B, C = c)$$. The marginal distribution of $$A$$ can also be obtained as $$P(A) = \sum_{b} \sum_{c}P(A, B = b, C = c)$$. We will also sometimes refer to the process of marginalization as "summing out."

When we do operations on probability distributions, sometimes we get distributions that do not necessarily sum to 1. To fix this, we **normalize**: take the sum of all entries in the distribution and divide each entry by that sum.

<p>
</p>
**Conditional probabilities** assign probabilities to events conditioned on some known facts. For instance, $$ P(A|B = b) $$ gives the probability distribution of $$ A $$ given that we know the value of $$ B $$ equals $$ b $$. Conditional probabilities are defined as:
**Conditional probabilities** assign probabilities to events conditioned on some known facts. For instance, $$P(A|B = b)$$ gives the probability distribution of $$A$$ given that we know the value of $$B$$ equals $$b$$. Conditional probabilities are defined as:

$$
P(A|B) = \frac{P(A, B)}{P(B)}.
$$
$$P(A|B) = \frac{P(A, B)}{P(B)}.$$

Combining the above definition of conditional probability and the chain rule, we get **Bayes' Rule**:

$$
P(A | B) = \frac{P(B | A) P(A)}{P(B)}
$$
$$P(A | B) = \frac{P(B | A) P(A)}{P(B)}$$

To write that random variables $$ A $$ and $$ B $$ are **mutually independent**, we write $$ A \perp\!\!\!\perp B $$. This is equivalent to $$ B \perp\!\!\!\perp A $$.
To write that random variables $$A$$ and $$B$$ are **mutually independent**, we write $$A \perp\!\!\!\perp B$$. This is equivalent to $$B \perp\!\!\!\perp A$$.

<p>
</p>
When $$ A $$ and $$ B $$ are mutually independent, $$ P(A, B) = P(A) P(B) $$. An example you can think of is two independent coin flips. You may be familiar with mutual independence as just "independence" in other courses. We can derive from the above equation and the chain rule that $$ P(A | B) = P(A) $$ and $$ P(B | A) = P(B) $$.
When $$A$$ and $$B$$ are mutually independent, $$P(A, B) = P(A) P(B)$$. An example you can think of is two independent coin flips. You may be familiar with mutual independence as just "independence" in other courses. We can derive from the above equation and the chain rule that $$P(A | B) = P(A)$$ and $$P(B | A) = P(B)$$.

<p>
</p>
To write that random variables $$ A $$ and $$ B $$ are **conditionally independent** given another random variable $$ C $$, we write $$ A \perp\!\!\!\perp B | C $$. This is also equivalent to $$ B \perp\!\!\!\perp A | C $$.
To write that random variables $$A$$ and $$B$$ are **conditionally independent** given another random variable $$C$$, we write $$A \perp\!\!\!\perp B | C$$. This is also equivalent to $$B \perp\!\!\!\perp A | C$$.

<p>
</p>
If $$ A $$ and $$ B $$ are conditionally independent given $$ C $$, then $$ P(A, B | C) = P(A | C) P(B | C) $$. This means that if we have knowledge about the value of $$ C $$, then $$ B $$ and $$ A $$ do not affect each other. Equivalent to the above definition of conditional independence are the relations $$ P(A | B, C) = P(A | C) $$ and $$ P(B | A, C) = P(B | C) $$. Notice how these three equations are equivalent to the three equations for mutual independence, just with an added conditional on $$ C $$!
If $$A$$ and $$B$$ are conditionally independent given $$C$$, then $$P(A, B | C) = P(A | C) P(B | C)$$. This means that if we have knowledge about the value of $$C$$, then $$B$$ and $$A$$ do not affect each other. Equivalent to the above definition of conditional independence are the relations $$P(A | B, C) = P(A | C)$$ and $$P(B | A, C) = P(B | C)$$. Notice how these three equations are equivalent to the three equations for mutual independence, just with an added conditional on $$C$$!

10 changes: 4 additions & 6 deletions bayes-nets/representation.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ title: '6.3 Bayesian Network Representation'
parent: 6. Bayes Nets
nav_order: 3
layout: page
header-includes:
\pagenumbering{gobble}
---

# 6.3 Bayesian Network Representation
Expand Down Expand Up @@ -40,15 +42,11 @@ In this Bayes Net, we would store probability tables $$P(B)$$, $$P(E)$$, $$P(A |

Given all of the CPTs for a graph, we can calculate the probability of a given assignment using the following rule:

$$
P(X1, X2, ..., Xn) = \prod_{i=1}^n{P(X_i | parents(X_i))}
$$
$$P(X1, X2, ..., Xn) = \prod_{i=1}^n{P(X_i | parents(X_i))}$$

For the alarm model above, we can actually calculate the probability of a joint probability as follows:

$$
P(-b, -e, +a, +j, -m) = P(-b) \cdot P(-e) \cdot P(+a | -b, -e) \cdot P(+j | +a) \cdot P(-m | +a)
$$
$$P(-b, -e, +a, +j, -m) = P(-b) \cdot P(-e) \cdot P(+a | -b, -e) \cdot P(+j | +a) \cdot P(-m | +a)$$

We will see how this relation holds in the next section.

Expand Down
14 changes: 5 additions & 9 deletions bayes-nets/structure.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ title: '6.4 Structure of Bayes Nets'
parent: 6. Bayes Nets
nav_order: 4
layout: page
header-includes:
\pagenumbering{gobble}
---

# 6.4 Structure of Bayes Nets
Expand All @@ -19,9 +21,7 @@ In this class, we will refer to two rules for Bayes Net independences that can b

Using these tools, we can return to the assertion in the previous section: that we can get the joint distribution of all variables by joining the CPTs of the Bayes Net.

$$
P(X_1, X_2, \dots, X_n) = \prod_{i=1}^n P(X_i | \text{parents}(X_i))
$$
$$P(X_1, X_2, \dots, X_n) = \prod_{i=1}^n P(X_i | \text{parents}(X_i))$$

This relation between the joint distribution and the CPTs of the Bayes net works because of the conditional independence relationships given by the graph. We will prove this using an example.

Expand All @@ -32,15 +32,11 @@ Let's revisit the previous example. We have the CPTs $$P(B)$$ , $$P(E)$$ , $$P(A

For this Bayes net, we are trying to prove the following relation:

$$
P(B, E, A, J, M) = P(B)P(E)P(A | B, E)P(J | A)P(M | A)
$$
$$P(B, E, A, J, M) = P(B)P(E)P(A | B, E)P(J | A)P(M | A)$$

We can expand the joint distribution another way: using the chain rule. If we expand the joint distribution with topological ordering (parents before children), we get the following equation:

$$
P(B, E, A, J, M) = P(B)P(E | B)P(A | B, E)P(J | B, E, A)P(M | B, E, A, J)
$$
$$P(B, E, A, J, M) = P(B)P(E | B)P(A | B, E)P(J | B, E, A)P(M | B, E, A, J)$$

<p></p>
Notice that in the first equation every variable is represented in a CPT $$P(var | Parents(var))$$ , while in the second equation, every variable is represented in a CPT $$P(var | Parents(var), Ancestors(var))$$ .
Expand Down

0 comments on commit c37c202

Please sign in to comment.