Skip to content

PFasano99/RL_test_ground

Repository files navigation

ISPR project - Reinforcement Learning Play Ground

This repo is a playground to test some reinforcement learning algorithms on a labyrinth. The reinforcement learning agents that are implemented to solve this task are the following:

  • Actor-Critic
  • Deep-Q Networks
  • Value Iteration / Policy Iteration (Dynamic programming)
  • SARSA / $\lambda$ SARSA / Q-Learning

Requirements

Before running the code, install following requirements modules:

  • numpy==1.21.0

  • random2==1.0.1

  • tqdm==4.61.1

  • matplotlib==3.4.2

  • torch==1.9.0

  • keyboard==0.13.5

  • pygame==2.0.1
    All the requirements can be installed running:

      pip install -r requirements.txt
    

The environment

The environment used in this task was custom made. It was created so that any labyrinth of size nxm (with n < m) could be automatically generated with different parameters to decide the complexity of the maze. The methods for generating the maze are located in maze_generator.py

To generate a maze you can call the function as follows:
    
    from maze_generator import *

    generator = maze_generator()
    maze, path = maze.generate_maze(size_x= 20, size_y =40, start_coord = [0,1], finish_coord = [19,39], n_of_turns = 4, log = False)

Or you can call the load function if you have a maze you have already generated and saved before.

    from maze_generator import *

    generator = maze_generator()
    maze, path, player_coords, win_cords = generator.load_maze_from_csv("saved_maze/maze1")

Rewards and actions

The possible actions and rewards that the agent can take/gain are reported in the following table. In this environment the objective of the agent is to minimize the number of points he loses, the initial score is n_of_viable_states*0.4

Action ID Action Reward Reward on backtrack Reward on action fail
"up" Player treys to move up 1 step -0.05 -0.25 -0.75
"down" Player treys to move down 1 step -0.05 -0.25 -0.75
"left" Player treys to move left 1 step -0.05 -0.25 -0.75
"right" Player treys to move right 1 step -0.05 -0.25 -0.75
"jump_down" Player treys to jump down moving 2 positions down -0.15 -0.25 -0.75
"jump_up" Player treys to jump up moving 2 positions up -0.15 -0.25 -0.75
"jump_right" Player treys to jump right moving 2 positions right -0.15 -0.25 -0.75
"jump_left" Player treys to jump left moving 2 positions left -0.15 -0.25 -0.75

If the agent reaches the finishing state (win position) it will gain +1 point.

An action is considered failed if the state after the action is the same; e.g

If the agent tries to go left but the only thing on his left is a wall this results in a failed action. 

    01000000             01000000
    09111111  --left-->  09111111
    00000001             00000001

The jump mechanic allows the agent to jump over walls, in other worlds it allows the agent to skip over 1 space; e.g.

    1000000                 1000000      1000111                 1000111
    1000100                 1000100      1000100                 1000100
    1911111 --jump_right--> 1119111  or  1190111 --jump_right--> 1110900 
    0000001                 0000001      1000100                 1000100
    0000001                 0000001      1111100                 1111100

(where 1 is the legal path, 0 is a wall and 9 the player)

Getting started

The algorithms can be called from the main.py, they can be used directly on a pre-existing mazes or generating a new maze.

The main.py has the following set of parameters:

Parameter Type Default Description
Parameters to choose which RL algorithms to run
--run_all bool True Run all RL algorithms on maze
--run_AC bool False Run actor critic on maze
--run_DQN bool False Run deep Q-network on maze
--run_VI bool False Run value iteration on maze
--run_SA bool False Run SARSA on maze
--run_SAL bool False Run SARSA-lambda on maze
--run_Q bool False Run Q-learning on maze
--run_RA bool False Run random-approach on maze
Parameters for the maze
--maze_path str "" Path to load an already existing maze.csv
--maze_name str "" The name to be assigned to the generated maze
--save_path str "" Path to save any file generated e.g. policies
--maze_x int 20 Height of the maze
--maze_y int 40 Width of the maze
--maze_fp int 1 The number of minimum focal points to add in the maze
Parameters for deep models
--batch_size int 512 Batch size for deep models
--epsilon_decay float 0.999 The value of decay of epsilon for each episode
--episodes int 2000 Number of episodes to be run
--replay_buffer_size int 50000 The size of the replay buffer from which the batches will be sampled
--cuda bool True Use GPU for training
--epsilon_start float 1 The epsilon starting value for the epsilon greedy choice of actions in the deep models
Parameters for RL
--gamma float 0.99 The discount value
--epsilon float 0.2 The epsilon value for the epsilon greedy choice of actions (this parameter does not apply to deep models)
--alpha float 0.5 Alpha therm for SARSA and Q-learning
--lambda_s float 0.2 Lambda therm for SARSA-lambda

The use of run_all will use the same parameters for all the algorithms, this could lead to not finding the ideal solution.

The parameters used to test the algorithm are the following:

Random algorithm
    python3 ./main.py --maze_path ./saved_maze/maze4  --run_RA True --episodes 100

Value iteration
    python3 ./main.py --maze_path ./saved_maze/maze4  --run_VI True --episodes 2000 --gamma 0.99
    
SARSA 
    python3 ./main.py --maze_path ./saved_maze/maze4  --run_SA True --episodes 2000 --gamma 0.99 --alpha 0.4 --epsilon 0.2

SARSA-lambda 
    python3 ./main.py --maze_path ./saved_maze/maze4  --run_SAL True --episodes 2000 --gamma 0.99 --alpha 0.6 --epsilon 0.2 --lambda_s 0.2

Q-learning
    python3 ./main.py --maze_path ./saved_maze/maze4  --run_Q True --episodes 2000 --gamma 0.99 --alpha 0.5 --epsilon 0.2

Deep Q-Network
    python3 ./main.py --maze_path ./saved_maze/maze4  --run_DQN True --episodes 2000 --gamma 0.99 --epsilon_start 1 --epsilon_decay 0.999 --cuda True --replay_buffer_size 50000 --batch_size 1024

Actor Critic
    python3 ./main.py --maze_path ./saved_maze/maze4  --run_AC True --episodes 2000 --gamma 0.99 --epsilon_start 1 --epsilon_decay 0.999 --cuda True --replay_buffer_size 50000 --batch_size 1024

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages