Skip to content

Commit

Permalink
ref: update deep-learning/index.md
Browse files Browse the repository at this point in the history
  • Loading branch information
arv-anshul committed Dec 3, 2024
1 parent 2a8f282 commit 017094d
Showing 1 changed file with 155 additions and 22 deletions.
177 changes: 155 additions & 22 deletions docs/ref/deep-learning/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,12 @@ date:
updated: 2024-06-10
title: Deep Learning
description: |
List of all the related concepts of deep learning.
Notes of "100 Days of DL" by CampusX.
List of some related concepts of Deep Learning like Perceptron and MLP; including notations, coding examples.
slug: deep-learning
icon: simple/python
icon: simple/pytorch
---

# :simple-tensorflow: Deep Learning

!!! info "Note"

I am following CampusX's **"100 Days of Deep Learning"** [YouTube Playlist](https://www.youtube.com/playlist?list=PLKnIA16_RmvYuZauWaPlRTC54KxSNLtNn).
# :simple-pytorch: Deep Learning

## Machine Learning VS Deep Learning

Expand Down Expand Up @@ -49,27 +44,165 @@ introduce non-linearity.
- Perceptron's geometric intuition is very similar to Linear Regression algorithm.
- Perceptron is limited to classify only linearly (or sort of linear) separable classes.

**Limitation**{ .danger }
### Limitation

Perceptron only works on linear data it doesn't learn non-linear data because perceptron is a linear model which draws a
line/plane/hyperplane on dataset.
line/plane/hyperplane on datasets.

## :material-family-tree:{ style="transform: rotate(90deg)" } Multi-Layer Perceptron (MLP)

A **Multi-Layer Perceptron (MLP)** is a class of feedforward artificial neural networks that consist of multiple layers
of nodes in a directed graph. Each node, except for the input nodes, represents a neuron that uses a non-linear
activation function. MLPs are capable of learning complex patterns in data, including non-linear relationships, making
them widely used in machine learning tasks like classification, regression, and feature extraction.

By using a single perceptron, we are limited to learning only linear decision boundaries. This restricts its ability to
model more complex datasets with inherent non-linear relationships. To overcome this limitation, we can add more
perceptrons in a structured way to create a **Multi-Layer Perceptron**.

### Why Don’t We Use Only One Layer Instead?

A single-layer perceptron can only model linearly separable data. For example, tasks like distinguishing between two
classes in XOR logic cannot be achieved by a single-layer perceptron, as its decision boundary is inherently linear.
However, real-world datasets are often non-linear in nature.

Adding hidden layers in an MLP allows the network to transform input features through non-linear activation functions,
enabling it to create complex decision boundaries. These hidden layers progressively learn higher-order features, making
MLP a universal approximator of functions, as proven by the Universal Approximation Theorem.

### How Does MLP Capture Non-Linearity?

MLPs capture non-linearity through two primary mechanisms:

1. **Non-Linear Activation Functions:** Non-linear activation functions like ReLU, Sigmoid, or Tanh introduce the
ability to model complex patterns in the data. Without these functions, the MLP would behave like a linear model,
regardless of the number of layers.

2. **Layered Structure:** Each hidden layer processes inputs to extract increasingly abstract features. These features,
when passed through activation functions, enable the network to learn representations that are non-linear
transformations of the original data.

By stacking layers, the network learns a hierarchy of features, where lower layers might capture simple patterns (like
edges in an image), and deeper layers capture more abstract representations (like shapes or objects).

### Forward Pass in MLP

The **forward pass** is the process of passing input data through the network to compute the output predictions. It
involves the following steps:

1. **Input Layer:** The input data is fed into the network as feature vectors.

2. **Hidden Layers:** Each hidden layer applies a weighted sum of inputs followed by a bias term and an activation
function. Mathematically, for a hidden layer \( l \):

??? info "Hidden layers in Notation"

$$h^{l} = f(W^{l}h^{(l-1)} + b^{l})$$

where:

- $W^{l}$ is the weight matrix for layer $l$.
- $b^{l}$ is the bias vector for layer $l$.
- $f$ is the activation function (e.g., ReLU).
- $h^{(l-1)}$ is the output from the previous layer.

3. **Output Layer:** The final layer produces predictions, often applying a specific activation function (e.g., softmax
for classification or linear activation for regression).

```python title="Example using Keras"
from keras import Sequential
from keras.layers import Dense

# Define a simple MLP model
model = Sequential([
Dense(16, activation='relu', input_shape=(4,)), # Input layer
Dense(8, activation='relu'), # Hidden layer
Dense(1, activation='sigmoid') # Output layer
])

model.summary()
```

!!! info

The forward pass results in the computation of predictions based on the current weights and biases.

### Backward Propagation in MLP

The **backward propagation** algorithm is used to train the MLP by adjusting weights and biases to minimize the error
between predicted and actual outputs. It works in the following steps:

1. **Compute Loss:** A loss function (e.g., Mean Squared Error for regression or Cross-Entropy Loss for classification)
measures the error between the predicted output and the true labels.

2. **Backpropagation of Errors:** The error is propagated backward through the network using the chain rule of calculus
to compute the gradient of the loss function with respect to each weight and bias. The gradients for each parameter
are computed layer by layer in reverse order.

3. **Update Weights and Biases:** Using the computed gradients, the weights and biases are updated using an optimization
algorithm like Gradient Descent or its variants (e.g., Adam, RMSprop):

??? info "Backward Propagation with Notation"

$$W^{l} = W^{l} - \eta \frac{\partial \mathcal{L}}{\partial W^{l}}$$

$$b^{l} = b^{l} - \eta \frac{\partial \mathcal{L}}{\partial b^{l}}$$

where:

- $\eta$ is the learning rate.
- $\mathcal{L}$ is the loss function.

```python title="Example using PyTorch"
import torch
import torch.nn as nn
import torch.optim as optim

# Define a simple MLP model
class SimpleMLP(nn.Module):
def __init__(self):
super(SimpleMLP, self).__init__()
self.fc1 = nn.Linear(4, 16) # Input to hidden layer
self.fc2 = nn.Linear(16, 8) # Hidden to hidden layer
self.fc3 = nn.Linear(8, 1) # Hidden to output layer

def forward(self, x):
x = torch.relu(self.fc1(x))
x = torch.relu(self.fc2(x))
x = torch.sigmoid(self.fc3(x))
return x

# Instantiate the model, loss function, and optimizer
model = SimpleMLP()
criterion = nn.BCELoss() # Binary Cross Entropy Loss
optimizer = optim.Adam(model.parameters(), lr=0.01)

<div class="grid" markdown>
# Dummy data for training
inputs = torch.randn(10, 4) # Batch of 10 samples, 4 features each
targets = torch.randint(0, 2, (10, 1)).float() # Binary targets

:simple-youtube:
[Problem with Perceptron](https://www.youtube.com/watch?v=Jp44b27VnOg&list=PLKnIA16_RmvYuZauWaPlRTC54KxSNLtNn&index=7)
# Training loop
for epoch in range(10):
# Forward pass
outputs = model(inputs)
loss = criterion(outputs, targets)

</div>
# Backward pass
optimizer.zero_grad()
loss.backward()
optimizer.step()

### Perceptron Resources
if (epoch + 1) % 10 == 0:
print(f"Epoch [{epoch+1}/10], Loss: {loss.item():.4f}")
```

<div class="grid" markdown>
!!! info

:simple-youtube:{ .youtube }
[What is a Perceptron? Perceptron Vs Neuron | Perceptron Geometric Intuition](https://youtu.be/X7iIKPoZ0Sw)
:simple-youtube:{ .youtube }
[Perceptron Trick | How to train a Perceptron | Perceptron Part 2 | Deep Learning Full Course](https://youtu.be/Lu2bruOHN6g)
By iteratively performing forward and backward passes over multiple epochs, the network learns the optimal
parameters to minimize the loss and generalize to unseen data.

</div>
### Importance of MLP

## Multi-Layer Perceptron
By combining these techniques, MLPs become powerful tools for modeling both linear and non-linear patterns in data. They
form the foundation of many advanced deep learning architectures, such as **Convolutional Neural Networks (CNNs)** and
**Recurrent Neural Networks (RNNs)**.

0 comments on commit 017094d

Please sign in to comment.