This repo is a playground to test some reinforcement learning algorithms on a labyrinth. The reinforcement learning agents that are implemented to solve this task are the following:
- Actor-Critic
- Deep-Q Networks
- Value Iteration / Policy Iteration (Dynamic programming)
- SARSA /
$\lambda$ SARSA / Q-Learning
Before running the code, install following requirements modules:
-
numpy==1.21.0
-
random2==1.0.1
-
tqdm==4.61.1
-
matplotlib==3.4.2
-
torch==1.9.0
-
keyboard==0.13.5
-
pygame==2.0.1
All the requirements can be installed running:pip install -r requirements.txt
The environment used in this task was custom made. It was created so that any labyrinth of size nxm (with n < m) could be automatically generated with different parameters to decide the complexity of the maze. The methods for generating the maze are located in maze_generator.py
To generate a maze you can call the function as follows:
from maze_generator import *
generator = maze_generator()
maze, path = maze.generate_maze(size_x= 20, size_y =40, start_coord = [0,1], finish_coord = [19,39], n_of_turns = 4, log = False)
Or you can call the load function if you have a maze you have already generated and saved before.
from maze_generator import *
generator = maze_generator()
maze, path, player_coords, win_cords = generator.load_maze_from_csv("saved_maze/maze1")
The possible actions and rewards that the agent can take/gain are reported in the following table. In this environment the objective of the agent is to minimize the number of points he loses, the initial score is n_of_viable_states*0.4
Action ID | Action | Reward | Reward on backtrack | Reward on action fail |
---|---|---|---|---|
"up" | Player treys to move up 1 step | -0.05 | -0.25 | -0.75 |
"down" | Player treys to move down 1 step | -0.05 | -0.25 | -0.75 |
"left" | Player treys to move left 1 step | -0.05 | -0.25 | -0.75 |
"right" | Player treys to move right 1 step | -0.05 | -0.25 | -0.75 |
"jump_down" | Player treys to jump down moving 2 positions down | -0.15 | -0.25 | -0.75 |
"jump_up" | Player treys to jump up moving 2 positions up | -0.15 | -0.25 | -0.75 |
"jump_right" | Player treys to jump right moving 2 positions right | -0.15 | -0.25 | -0.75 |
"jump_left" | Player treys to jump left moving 2 positions left | -0.15 | -0.25 | -0.75 |
If the agent reaches the finishing state (win position) it will gain +1 point.
An action is considered failed if the state after the action is the same; e.g
If the agent tries to go left but the only thing on his left is a wall this results in a failed action.
01000000 01000000
09111111 --left--> 09111111
00000001 00000001
The jump mechanic allows the agent to jump over walls, in other worlds it allows the agent to skip over 1 space; e.g.
1000000 1000000 1000111 1000111
1000100 1000100 1000100 1000100
1911111 --jump_right--> 1119111 or 1190111 --jump_right--> 1110900
0000001 0000001 1000100 1000100
0000001 0000001 1111100 1111100
(where 1 is the legal path, 0 is a wall and 9 the player)
The algorithms can be called from the main.py, they can be used directly on a pre-existing mazes or generating a new maze.
The main.py has the following set of parameters:
Parameter | Type | Default | Description |
---|---|---|---|
Parameters to choose which RL algorithms to run | |||
--run_all | bool | True | Run all RL algorithms on maze |
--run_AC | bool | False | Run actor critic on maze |
--run_DQN | bool | False | Run deep Q-network on maze |
--run_VI | bool | False | Run value iteration on maze |
--run_SA | bool | False | Run SARSA on maze |
--run_SAL | bool | False | Run SARSA-lambda on maze |
--run_Q | bool | False | Run Q-learning on maze |
--run_RA | bool | False | Run random-approach on maze |
Parameters for the maze | |||
--maze_path | str | "" | Path to load an already existing maze.csv |
--maze_name | str | "" | The name to be assigned to the generated maze |
--save_path | str | "" | Path to save any file generated e.g. policies |
--maze_x | int | 20 | Height of the maze |
--maze_y | int | 40 | Width of the maze |
--maze_fp | int | 1 | The number of minimum focal points to add in the maze |
Parameters for deep models | |||
--batch_size | int | 512 | Batch size for deep models |
--epsilon_decay | float | 0.999 | The value of decay of epsilon for each episode |
--episodes | int | 2000 | Number of episodes to be run |
--replay_buffer_size | int | 50000 | The size of the replay buffer from which the batches will be sampled |
--cuda | bool | True | Use GPU for training |
--epsilon_start | float | 1 | The epsilon starting value for the epsilon greedy choice of actions in the deep models |
Parameters for RL | |||
--gamma | float | 0.99 | The discount value |
--epsilon | float | 0.2 | The epsilon value for the epsilon greedy choice of actions (this parameter does not apply to deep models) |
--alpha | float | 0.5 | Alpha therm for SARSA and Q-learning |
--lambda_s | float | 0.2 | Lambda therm for SARSA-lambda |
The use of run_all will use the same parameters for all the algorithms, this could lead to not finding the ideal solution.
The parameters used to test the algorithm are the following:
Random algorithm
python3 ./main.py --maze_path ./saved_maze/maze4 --run_RA True --episodes 100
Value iteration
python3 ./main.py --maze_path ./saved_maze/maze4 --run_VI True --episodes 2000 --gamma 0.99
SARSA
python3 ./main.py --maze_path ./saved_maze/maze4 --run_SA True --episodes 2000 --gamma 0.99 --alpha 0.4 --epsilon 0.2
SARSA-lambda
python3 ./main.py --maze_path ./saved_maze/maze4 --run_SAL True --episodes 2000 --gamma 0.99 --alpha 0.6 --epsilon 0.2 --lambda_s 0.2
Q-learning
python3 ./main.py --maze_path ./saved_maze/maze4 --run_Q True --episodes 2000 --gamma 0.99 --alpha 0.5 --epsilon 0.2
Deep Q-Network
python3 ./main.py --maze_path ./saved_maze/maze4 --run_DQN True --episodes 2000 --gamma 0.99 --epsilon_start 1 --epsilon_decay 0.999 --cuda True --replay_buffer_size 50000 --batch_size 1024
Actor Critic
python3 ./main.py --maze_path ./saved_maze/maze4 --run_AC True --episodes 2000 --gamma 0.99 --epsilon_start 1 --epsilon_decay 0.999 --cuda True --replay_buffer_size 50000 --batch_size 1024