ISPR project - Reinforcement Learning Play Ground

This repo is a playground to test some reinforcement learning algorithms on a labyrinth. The reinforcement learning agents that are implemented to solve this task are the following:

Actor-Critic
Deep-Q Networks
Value Iteration / Policy Iteration (Dynamic programming)
SARSA / $\lambda$ SARSA / Q-Learning

Requirements

Before running the code, install following requirements modules:

numpy==1.21.0
random2==1.0.1
tqdm==4.61.1
matplotlib==3.4.2
torch==1.9.0
keyboard==0.13.5
pygame==2.0.1
All the requirements can be installed running:
```
  pip install -r requirements.txt
```

The environment

The environment used in this task was custom made. It was created so that any labyrinth of size nxm (with n < m) could be automatically generated with different parameters to decide the complexity of the maze. The methods for generating the maze are located in maze_generator.py

To generate a maze you can call the function as follows:
    
    from maze_generator import *

    generator = maze_generator()
    maze, path = maze.generate_maze(size_x= 20, size_y =40, start_coord = [0,1], finish_coord = [19,39], n_of_turns = 4, log = False)

Or you can call the load function if you have a maze you have already generated and saved before.

    from maze_generator import *

    generator = maze_generator()
    maze, path, player_coords, win_cords = generator.load_maze_from_csv("saved_maze/maze1")

Rewards and actions

The possible actions and rewards that the agent can take/gain are reported in the following table. In this environment the objective of the agent is to minimize the number of points he loses, the initial score is n_of_viable_states*0.4

Action ID	Action	Reward	Reward on backtrack	Reward on action fail
"up"	Player treys to move up 1 step	-0.05	-0.25	-0.75
"down"	Player treys to move down 1 step	-0.05	-0.25	-0.75
"left"	Player treys to move left 1 step	-0.05	-0.25	-0.75
"right"	Player treys to move right 1 step	-0.05	-0.25	-0.75
"jump_down"	Player treys to jump down moving 2 positions down	-0.15	-0.25	-0.75
"jump_up"	Player treys to jump up moving 2 positions up	-0.15	-0.25	-0.75
"jump_right"	Player treys to jump right moving 2 positions right	-0.15	-0.25	-0.75
"jump_left"	Player treys to jump left moving 2 positions left	-0.15	-0.25	-0.75

If the agent reaches the finishing state (win position) it will gain +1 point.

An action is considered failed if the state after the action is the same; e.g

If the agent tries to go left but the only thing on his left is a wall this results in a failed action. 

    01000000             01000000
    09111111  --left-->  09111111
    00000001             00000001

The jump mechanic allows the agent to jump over walls, in other worlds it allows the agent to skip over 1 space; e.g.

    1000000                 1000000      1000111                 1000111
    1000100                 1000100      1000100                 1000100
    1911111 --jump_right--> 1119111  or  1190111 --jump_right--> 1110900 
    0000001                 0000001      1000100                 1000100
    0000001                 0000001      1111100                 1111100

(where 1 is the legal path, 0 is a wall and 9 the player)

Getting started

The algorithms can be called from the main.py, they can be used directly on a pre-existing mazes or generating a new maze.

The main.py has the following set of parameters:

Parameter	Type	Default	Description
Parameters to choose which RL algorithms to run
--run_all	bool	True	Run all RL algorithms on maze
--run_AC	bool	False	Run actor critic on maze
--run_DQN	bool	False	Run deep Q-network on maze
--run_VI	bool	False	Run value iteration on maze
--run_SA	bool	False	Run SARSA on maze
--run_SAL	bool	False	Run SARSA-lambda on maze
--run_Q	bool	False	Run Q-learning on maze
--run_RA	bool	False	Run random-approach on maze
Parameters for the maze
--maze_path	str	""	Path to load an already existing maze.csv
--maze_name	str	""	The name to be assigned to the generated maze
--save_path	str	""	Path to save any file generated e.g. policies
--maze_x	int	20	Height of the maze
--maze_y	int	40	Width of the maze
--maze_fp	int	1	The number of minimum focal points to add in the maze
Parameters for deep models
--batch_size	int	512	Batch size for deep models
--epsilon_decay	float	0.999	The value of decay of epsilon for each episode
--episodes	int	2000	Number of episodes to be run
--replay_buffer_size	int	50000	The size of the replay buffer from which the batches will be sampled
--cuda	bool	True	Use GPU for training
--epsilon_start	float	1	The epsilon starting value for the epsilon greedy choice of actions in the deep models
Parameters for RL
--gamma	float	0.99	The discount value
--epsilon	float	0.2	The epsilon value for the epsilon greedy choice of actions (this parameter does not apply to deep models)
--alpha	float	0.5	Alpha therm for SARSA and Q-learning
--lambda_s	float	0.2	Lambda therm for SARSA-lambda

The use of run_all will use the same parameters for all the algorithms, this could lead to not finding the ideal solution.

The parameters used to test the algorithm are the following:

Random algorithm
    python3 ./main.py --maze_path ./saved_maze/maze4  --run_RA True --episodes 100

Value iteration
    python3 ./main.py --maze_path ./saved_maze/maze4  --run_VI True --episodes 2000 --gamma 0.99
    
SARSA 
    python3 ./main.py --maze_path ./saved_maze/maze4  --run_SA True --episodes 2000 --gamma 0.99 --alpha 0.4 --epsilon 0.2

SARSA-lambda 
    python3 ./main.py --maze_path ./saved_maze/maze4  --run_SAL True --episodes 2000 --gamma 0.99 --alpha 0.6 --epsilon 0.2 --lambda_s 0.2

Q-learning
    python3 ./main.py --maze_path ./saved_maze/maze4  --run_Q True --episodes 2000 --gamma 0.99 --alpha 0.5 --epsilon 0.2

Deep Q-Network
    python3 ./main.py --maze_path ./saved_maze/maze4  --run_DQN True --episodes 2000 --gamma 0.99 --epsilon_start 1 --epsilon_decay 0.999 --cuda True --replay_buffer_size 50000 --batch_size 1024

Actor Critic
    python3 ./main.py --maze_path ./saved_maze/maze4  --run_AC True --episodes 2000 --gamma 0.99 --epsilon_start 1 --epsilon_decay 0.999 --cuda True --replay_buffer_size 50000 --batch_size 1024

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
saved_maze		saved_maze
DeepQNetwork.py		DeepQNetwork.py
ISPR Project - RL - Paolo Fasano.pdf		ISPR Project - RL - Paolo Fasano.pdf
README.md		README.md
actor_critic.py		actor_critic.py
actor_critic_cartPole.py		actor_critic_cartPole.py
baseline.py		baseline.py
dynamic_programming.py		dynamic_programming.py
main.py		main.py
maze_generator.py		maze_generator.py
maze_ui.py		maze_ui.py
minosses_maze.py		minosses_maze.py
requirements.txt		requirements.txt
temporal_difference.py		temporal_difference.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ISPR project - Reinforcement Learning Play Ground

Requirements

The environment

Rewards and actions

Getting started

About

Releases

Packages

Languages

PFasano99/RL_test_ground

Folders and files

Latest commit

History

Repository files navigation

ISPR project - Reinforcement Learning Play Ground

Requirements

The environment

Rewards and actions

Getting started

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages