From 0b83c30ef78d5d3c0f4b9f99ffa854e9439443c4 Mon Sep 17 00:00:00 2001
From: Vijay Janapa Reddi <vj@eecs.harvard.edu>
Date: Tue, 31 Oct 2023 08:53:24 -0400
Subject: [PATCH] Section header fix

---
 optimizations.qmd | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/optimizations.qmd b/optimizations.qmd
index 21ec4cfb..70773892 100644
--- a/optimizations.qmd
+++ b/optimizations.qmd
@@ -136,6 +136,7 @@ More formally, the lottery ticket hypothesis is a concept in deep learning that
 
 ![An example experiment from the lottery ticket hypothesis showing pruning and training experiments on a fully connected LeNet over a variety of pruning ratios: note the first plot showing how pruning is able to reveal a subnetwork nearly one-fifth the size that trains to a higher test accuracy faster than the unpruned network. However, further note how in the second plot that further pruned models in models that both train slower and are not able to achieve that same maximal test accuracy due to the lower number of parameters (Credit: ICLR).](images/modeloptimization_lottery_ticket_hypothesis.png)
 
+
 #### Challenges & Limitations
 
 There is no free lunch with pruning optimizations.
@@ -207,6 +208,7 @@ One of the seminal works in the realm of matrix factorization, particularly in t
 The main advantage of low-rank matrix factorization lies in its ability to reduce data dimensionality as shown in the image below where there are fewer parameters to store, making it computationally more efficient and reducing storage requirements at the cost of some additional compute. This can lead to faster computations and more compact data representations, which is especially valuable when dealing with large datasets. Additionally, it may aid in noise reduction and can reveal underlying patterns and relationships in the data.
 
 ![A visualization showing the decrease in parameterization enabled by low-rank matrix factorization. Observe how the matrix $M$ can be approximated by the product of matrices $L_k$ and $R_k^T$. For intuition, most fully connected layers in networks are stored as a projection matrix $M$, which requires $m \times n$ parameter to be loaded on computation. However, by decomposing and approximating it as the product of two lower rank matrices, we thus only need to store $m \times k + k\times n$ parameters in terms of storage while incurring an additional compute cost of the matrix multiplication.__So long as $k \< n/2$, this factorization has fewer parameters total to store while adding a computation of runtime $O(mkn)$ (Credit: Medium).](images/modeloptimization_low_rank_matrix_factorization.png)
+
 ##### Challenges
 
 But practitioners and researchers encounter a spectrum of challenges and considerations that necessitate careful attention and strategic approaches. As with any lossy compression technique, we may lose information during this approximation process: choosing the correct rank that balances the information lost and the computational costs is tricky as well and adds an additional hyper-parameter to tune for.