Skip to content

Latest commit

 

History

History
160 lines (114 loc) · 6.89 KB

README.md

File metadata and controls

160 lines (114 loc) · 6.89 KB

Reinforcement Learning

⚠️ WARNING ⚠️
This guide deals deep Q-Learning after a certain point and so the python codes are written assuming that anyone following this is familiar with basic python programming, how to develop models, and matrices manipulation, etc. If you are not upto date on the concepts, please refresh them before jumping in the code. The README, on the other hand, is written very thorougly, explaining all concepts from beginning, and can be easily followed by anyone.

Contact us :

For any query, please reach us at [email protected] or create a discussion!!!

Topics

This is a guide to help users get started with reinforcement learning. It will cover the following topics :

♦️ Introduction and types of machine learning

♦️ Understanding reinforcement learning

♦️ How does the learning process works

♦️ Concepts of reinforcement learning

♦️ Understanding the Markov decision process

♦️ Implementation of Q-learning algorithm

🟧 Introduction and types of Machine Learning

The ability of a machine to learn and improve from experience rather than explicit programming is termed as machine learning.

Types of machine learning :

✳️ Supervised

✳️ Unsupervised

✳️ Reinforcement



🟧 Understanding Reinforcement Learning

Reinforcement means encouraging a pattern or behaviour. This form of ML is a hit and trial method, because the model is new to the surroundings. The only way to learn is to experience and then, learn from that experience.

The main idea is to learn through a "Reward and Punishment" system. For every correct move, there is a reward, for every wrong one, there is a punishment. The model tries to maximise the reward.

It is to be noted that 'training' datasets are not present here. Neither the fed data is classified or labled. The model needs to figure out the path and the best approach to perform a certain task. The environment is also unknown to the agent.

▶️ Consider an Analogy
Let's suppose a baby is learning to walk towards a box of chocolates. The box acts as reward. 2 things can happen here.
First, the baby starts walking and makes it to the box. This result is positive since the goal is achived, and hence the baby is rewarded.
Second, it starts walking but falls due to some obstacle. This is a negative result. Not only the goal isn't achieved, but the baby also got hurt. Hence, we can say that baby is punished.



🟧 How does the learning process works?

There are 2 main components of the process, the learing agent, and the environment. The environment, just like in the real world, determines all parameters of larning. In programming, we call that algorithm.

The process starts when environment sends a state to the agent. The agent then takes an action on it. Based on the action taken by the agent, the environment sends the next state and the respective reward to the agent. The agent then updates its knowledge based on the reward and evalutes its previous action on it.
This goes on and on till the environment sends a terminal state.



🟧 Concepts of reinforcement learning

⚛️ Reward Maximisation : The agent must be trained in a way, that it takes the best action in order to maximise the reward

⚛️ Exploration and Exploitation : The agent explores and captures more information about the environment, and then uses that information to highten the rewards



🟧 Understanding Markov decision process

This process is a mathematical approach for mapping a solution in reinforcement learning. It can be assumed that, the purpose of reinforcement learning is to solve a Markov decision process. There are a certain number of parameters to get to the solution. They include :

  • Set of actions (A)
  • Set of states (S)
  • Reward (R)
  • Policy used to approach the problem (pi)
  • Value (V)

The series of actions taken by the agent throughout the process, defines the policy. And the collection of rewards recieved defines the value. The goal is to maximise the rewards by choosing the optimum policy.



🟧 Implementation of the Q-learning algorithm

The idea of Q-learning, is to have "Q-values" for every action that a model takes, given a state. These values need to be updated overtime in such a way, that after a course of actions, good result is produced. It is done by "Rewarding" the agent.
Q-Learning is also known as "Model free learning" because the learning algorithm we write is applicable to any environment.

The Q-learning formula :
q-learning

The codes of Q-learning are in continuation. Please proceed serial wise to understand everything, else, it will be very confusing as to why we are doing a certain step. For running the environment, we need a library called gym.

Installing gym -

sudo pip install gym

Or -

sudo python3 -m pip install gym

⚠️ Note : gym needs another library called 'pyglet' to work. Pyglet comes pre-installed with openGL, but it is the outdated version on purpose, since python v2.x only supports pyglet version (<= 1.5.0, >= 1.4.0). If you are using python3(recommended), please update pyglet -

sudo pip install pyglet --upgrade

If, however, your machine does not have pyglet installed, please install the latest version for python3 -

sudo pip install pyglet

Or -

sudo python3 -m pip install pyglet

An example models directory and a logs directory has been attached for reference.

Now, you should go and check the following directories for programs based on reinforcement learning. All programs are daisy chained, so please go through them in sequence :

  1. Part 1
  2. Part 2
  3. Part 3
  4. Part 4
  5. Part 5
  6. Part 6

This tutorial is complete. Be sure to check out other ones too. Enjoy!!!