Merge pull request #587 from harvard-edge/585-feedback-on-chapter-3-d…

…l-primer Improvement on Ch3 (DL Primer): add code snapshot for 3.5 training and 3.6 inference, fix equation typo
harvard-edge · Jan 5, 2025 · 58420d0 · 58420d0
2 parents 4075b1c + 62c4b01
commit 58420d0
Show file tree

Hide file tree

Showing 4 changed files with 80 additions and 23 deletions.
diff --git a/contents/core/dl_primer/dl_primer.qmd b/contents/core/dl_primer/dl_primer.qmd
@@ -544,13 +544,24 @@ Neural networks learn to perform tasks through a process of training on examples
 
 The core principle of neural network training is supervised learning from labeled examples. Consider our MNIST digit recognition task: we have a dataset of 60,000 training images, each a 28×28 pixel grayscale image paired with its correct digit label. The network must learn the relationship between these images and their corresponding digits through an iterative process of prediction and weight adjustment.
 
-Training operates as a loop, where each iteration involves processing a subset of training examples called a batch. For each batch, the network performs several key operations:
+::: {#fig-training layout-ncol=2 }
+
+![Training loop overview.](images/png/training-overview.png){#fig-training-overview height=5in}
+
+![Training loop detailed.](images/png/training-detailed.png){#fig-training-detailed height=5in}
+
+Overview and details of the training loop.
+:::
+
+Training operates as a loop, where each iteration involves processing a subset of training examples called a batch. As shown in @fig-training-overview, for each batch, the network performs several key operations:
 
 * Forward computation through the network layers to generate predictions
 * Evaluation of prediction accuracy using a loss function
 * Computation of weight adjustments based on prediction errors
 * Update of network weights to improve future predictions
 
+This process continues iteratively until the network achieves the desired performance. Each complete pass through all training examples is called an epoch, and the network typically requires multiple epochs to learn effectively.
+
 This process can be expressed mathematically. Given an input image $x$ and its true label $y$, the network computes its prediction:
 
 $$
@@ -650,10 +661,10 @@ $$
 
 In our MNIST example, if we have a batch of B images, the dimensions of these operations are:
 
-* Input $\mathbf{X}$: B × 784
-* First layer weights $\mathbf{W}^{(1)}$: n₁ × 784
-* Hidden layer weights $\mathbf{W}^{(l)}$: nₗ × n_{l-1}
-* Output layer weights $\mathbf{W}^{(L)}$: 10 × n_{L-1}
+* Input $\mathbf{X}$: $B$ × 784
+* First layer weights $\mathbf{W}^{(1)}$: $n_{1}$ × 784
+* Hidden layer weights $\mathbf{W}^{(l)}$: $n_{l}$ × $n_{l-1}$
+* Output layer weights $\mathbf{W}^{(L)}$: 10 × $n_{L-1}$
 
 #### Computational Process
 
@@ -977,26 +988,47 @@ In our MNIST training, with a typical batch size of 32, this means:
 
 #### Training Loop
 
-The complete training process combines forward propagation, backward propagation, and weight updates into a systematic training loop. This loop repeats until the network achieves satisfactory performance or reaches a predetermined number of iterations.
-
-A single pass through the entire training dataset is called an epoch. For MNIST, with 60,000 training images and a batch size of 32, each epoch consists of 1,875 batch iterations. The training loop structure is:
-
-1. For each epoch:
-   * Shuffle training data to prevent learning order-dependent patterns
-   * For each batch:
-     * Perform forward propagation
-     * Compute loss
-     * Execute backward propagation
-     * Update weights using gradient descent
-   * Evaluate network performance
+The complete training process combines forward propagation, backward propagation, and weight updates into a systematic training loop. This loop, as shown originally in @fig-training-overview, repeats until the network achieves satisfactory performance or reaches a predetermined number of iterations. A single pass through the entire training dataset is called an epoch. For MNIST, with 60,000 training images and a batch size of 32, each epoch consists of 1,875 batch iterations. @fig-training-detailed illustrates the training loop structure in greater detail and is implemented as follows (albeit at a high level).
+
+```{.python}
+#| label: code-training-loop
+
+# Simple training loop
+for epoch in range(num_epochs):
+    for batch in training_data:
+        # 1. Make prediction
+        prediction = model(batch.data)
+        
+        # 2. Calculate loss
+        loss = loss_function(prediction, batch.target)
+        
+        # 3. Calculate improvements
+        loss.backward()
+        
+        # 4. Update the network
+        optimizer.step()
+        optimizer.zero_grad()  # Reset for next time
+```
 
 During training, we monitor several key metrics:
 
 * Training loss: average loss over recent batches
 * Validation accuracy: performance on held-out test data
 * Learning progress: how quickly the network improves
 
-For our digit recognition task, we might observe the network's accuracy improve from 10% (random guessing) to over 95% through multiple epochs of training.
+As shown below, for our digit recognition task, the algorithm iterates through epochs (complete passes through the training data) and within each epoch, processes smaller batches of data. For each batch, the code follows our four key steps: making predictions, calculating the loss, determining how to improve (backward pass), and updating the network's weights. On the right, we observe the actual progress of this training process. The network starts in epoch 1 with poor performance---a training loss of 2.30 and only 12.54% accuracy, essentially making random guesses. By epoch 10, there is a significant improvement: the loss drops significantly to 0.34, and the network now correctly identifies digits 90.12% of the time. This improvement illustrates how the systematic repetition of our training loop gradually transforms the network from making random guesses to making informed predictions.
+
+```{.python}
+Epoch: 1/50
+Training Loss: 2.302891
+Validation Loss: 2.298756
+Validation Accuracy: 12.54%
+...
+Epoch: 10/50
+Training Loss: 0.342891
+Validation Loss: 0.338756
+Validation Accuracy: 90.12%
+```
 
 #### Practical Considerations
 
@@ -1012,7 +1044,7 @@ Training neural networks also presents several fundamental challenges. Overfitti
 
 ## Prediction Phase
 
-Neural networks serve two distinct purposes: learning from data during training and making predictions during inference. While we've explored how networks learn through forward propagation, backward propagation, and weight updates, the prediction phase operates differently. During inference, networks use their learned parameters to transform inputs into outputs without the need for learning mechanisms. This simpler computational process still requires careful consideration of how data flows through the network and how system resources are utilized. Understanding the prediction phase is crucial as it represents how neural networks are actually deployed to solve real-world problems, from classifying images to generating text predictions.
+Neural networks serve two distinct purposes: learning from data during training and making predictions during inference. While we have explored how networks learn through forward propagation, backward propagation, and weight updates, the prediction phase, more commonly referred to as inference, operates differently. During inference, networks use their learned parameters to transform inputs into outputs without the need for learning mechanisms. This simpler computational process still requires careful consideration of how data flows through the network and how system resources are utilized. Understanding the inference phase is crucial as it represents how neural networks are actually deployed to solve real-world problems, from classifying images to generating text predictions.
 
 ### Inference Fundamentals
 
@@ -1058,7 +1090,7 @@ This stark contrast between training and inference phases highlights why system
 
 The implementation of neural networks in practical applications requires a complete processing pipeline that extends beyond the network itself. This pipeline, which is illustrated in @fig-inference-pipeline transforms raw inputs into meaningful outputs through a series of distinct stages, each essential for the system's operation. Understanding this complete pipeline provides critical insights into the design and deployment of machine learning systems.
 
-![End-to-end workflow for the inference prediction phase.](images/png/inference_pipeline.png){#fig-inference-pipeline}
+![End-to-end workflow for the inference phase.](images/png/inference_pipeline.png){#fig-inference-pipeline}
 
 The key thing to notice from the figure is that machine learning systems operate as hybrid architectures that combine conventional computing operations with neural network computations. The neural network component, focused on learned transformations through matrix operations, represents just one element within a broader computational framework. This framework encompasses both the preparation of input data and the interpretation of network outputs, processes that rely primarily on traditional computing methods.
 
@@ -1210,13 +1242,38 @@ These optimization principles, while illustrated through our simple MNIST feedfo
 
 The transformation of neural network outputs into actionable predictions requires a return to traditional computing paradigms. Just as pre-processing bridges real-world data to neural computation, post-processing bridges neural outputs back to conventional computing systems. This completes the hybrid computing pipeline we examined earlier, where neural and traditional computing operations work in concert to solve real-world problems.
 
-The complexity of post-processing extends beyond simple mathematical transformations. Real-world systems must handle uncertainty, validate outputs, and integrate with larger computing systems. In our MNIST example, a digit recognition system might require not just the most likely digit, but also confidence measures to determine when human intervention is needed. This introduces additional computational steps: confidence thresholds, secondary prediction checks, and error handling logic---all of which are implemented in traditional computing frameworks.
+The complexity of post-processing extends beyond simple mathematical transformations. Real-world systems must handle uncertainty, validate outputs, and integrate with larger computing systems. In our MNIST example, a digit recognition system might require not just the most likely digit, but also confidence measures to determine when human intervention is needed. 
+
+As illustrated in @fig-post-processing, the system transforms raw neural network outputs through several stages: first applying softmax normalization to convert logits to probabilities, then implementing confidence thresholds to determine when human verification is needed, and finally formatting successful predictions for downstream use.
+
+![Post-processing pipeline for MNIST digit recognition.](images/png/post-processing-flow.png){#fig-post-processing}
 
 The computational requirements of post-processing differ significantly from neural network inference. While inference benefits from parallel processing and specialized hardware, post-processing typically runs on conventional CPUs and follows sequential logic patterns. This return to traditional computing brings both advantages and constraints. Operations are more flexible and easier to modify than neural computations, but they may become bottlenecks if not carefully implemented. For instance, computing softmax probabilities for a batch of predictions requires different optimization strategies than the matrix multiplications of neural network layers.
 
 System integration considerations often dominate post-processing design. Output formats must match downstream system requirements, error handling must align with broader system protocols, and performance must meet system-level constraints. In a complete mail sorting system, the post-processing stage must not only identify digits but also format these predictions for the sorting machinery, handle uncertainty cases appropriately, and maintain processing speeds that match physical mail flow rates.
 
-This return to traditional computing paradigms completes the hybrid nature of machine learning systems. Just as pre-processing prepared real-world data for neural computation, post-processing adapts neural outputs for real-world use. Understanding this hybrid nature---the interplay between neural and traditional computing--- is essential for designing and implementing effective machine learning systems. 
+This return to traditional computing paradigms completes the hybrid nature of machine learning systems. Just as pre-processing prepared real-world data for neural computation, post-processing adapts neural outputs for real-world use. This hybrid nature---the interplay between neural and traditional computing---is essential for designing and implementing effective machine learning systems. 
+
+Below, we provide a code example illustrating the inference process:
+```{.python}
+# Switch model to evaluation mode
+model.eval()
+
+# Inference loop
+for batch in inference_data_loader:
+    # 1. Load input data
+    inputs = batch
+
+    # 2. Forward pass (no gradient computation required)
+    with torch.no_grad():
+        predictions = model(inputs)
+
+    # 3. Post-process predictions (optional)
+    processed_predictions = post_process(predictions)
+
+    # (Optional) Output or store predictions
+    print(f"Predictions: {processed_predictions}")
+```
 
 ## Case study: USPS Postal Service
 
@@ -1280,7 +1337,7 @@ As we move forward to explore more complex architectures and applications in sub
 
 In this chapter, we explored the foundational concepts of neural networks, bridging the gap between biological inspiration and artificial implementation. We began by examining the remarkable efficiency and adaptability of the human brain, uncovering how its principles influence the design of artificial neurons. From there, we delved into the behavior of a single artificial neuron, breaking down its components and operations. This understanding laid the groundwork for constructing neural networks, where layers of interconnected neurons collaborate to tackle increasingly complex tasks.
 
-The progression from single neurons to network-wide behavior underscored the power of hierarchical learning, where each layer extracts and transforms patterns from raw data into meaningful abstractions. We examined both the learning process and the prediction phase, showing how neural networks first refine their performance through training and then deploy that knowledge through inference. The distinction between these phases revealed important system-level considerations for practical implementations.
+The progression from single neurons to network-wide behavior underscored the power of hierarchical learning, where each layer extracts and transforms patterns from raw data into meaningful abstractions. We examined both the learning process and the inference phase, showing how neural networks first refine their performance through training and then deploy that knowledge through inference. The distinction between these phases revealed important system-level considerations for practical implementations.
 
 Our exploration of the complete processing pipeline---from pre-processing through inference to post-processing---highlighted the hybrid nature of machine learning systems, where traditional computing and neural computation work together. The USPS case study demonstrated how these theoretical principles translate into practical applications, revealing both the power and complexity of deployed neural networks. These real-world considerations, from data collection to system integration, form an essential part of understanding machine learning systems.
 

diff --git a/contents/core/dl_primer/images/png/post-processing-flow.png b/contents/core/dl_primer/images/png/post-processing-flow.png
diff --git a/contents/core/dl_primer/images/png/training-detailed.png b/contents/core/dl_primer/images/png/training-detailed.png
diff --git a/contents/core/dl_primer/images/png/training-overview.png b/contents/core/dl_primer/images/png/training-overview.png