Last Survivors Reinforcement Learning

This project provides a TorchRL reinforcement learning (RL) environment for the Dota 2 arcade game Last Survivors, along with other utilities used to interact with the game.

Installation

Prerequisites can found in requirements.txt and can be installed with pip install -r requirements.txt if the user has pip installed.

Usage

Environment

The LastSurvivors environment takes the following arguments:

Hero: The name of the hero to select.
Stage: The name of the stage to select.
Difficulty: The name of the difficutly to select.
Level: The level number to select.
Speed: The speed number to select

Note: Since menu options are selected through the detection of visual elements (template matching):

The names of the image files within each directory in images/templates/menu/ represent the valid choices for that parameter.
If a desired choice is missing, it can be implemented by uploading a picture of the choice's menu element into the parameter's corresponding directory.
For example, if I wanted to implement selecting the hero "Sniper", save a screenshot of Sniper's menu element to images/templates/menu/heroes/ as Sniper.png.

Training Policies

Any compatible reinforcement learning algorithm that can be implemented in PyTorch's TorchRL can be used with the LastSurvivors environment.

A sample training loop can be found in src/train.py. Other examples can be found here.

train.py:

"""
An example training loop for the LastSurvivors environment. 
Adapted from this example: https://pytorch.org/tutorials/advanced/pendulum.html#training-a-simple-policy
"""
print("\033c") # clear the terminal

import torch
from torch import nn
from tensordict.nn import TensorDictModule
import tqdm
from collections import defaultdict

from env import LastSurvivors
env = LastSurvivors('Drow Ranger', 'tomb of the ancestors', 'expert', '1', '2')

torch.manual_seed(0)
env.set_seed(0)

net = nn.Sequential(
    nn.LazyLinear(64),
    nn.Tanh(),
    # nn.LazyLinear(64),
    # nn.Tanh(),
    # nn.LazyLinear(64),
    # nn.Tanh(),
    nn.LazyLinear(1),
)
policy = TensorDictModule(
    net,
    in_keys=["choices"],
    out_keys=["action"],
)

optim = torch.optim.Adam(policy.parameters(), lr=2e-3)

batch_size = 1
n_episodes = 2
pbar = tqdm.tqdm(range(n_episodes // batch_size))
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optim, 20_000)
logs = defaultdict(list)

for _ in pbar:
    rollout = env.rollout(100, policy)
    traj_return = rollout["next", "reward"].mean()
    (-traj_return).backward()
    gn = torch.nn.utils.clip_grad_norm_(net.parameters(), 1.0)
    optim.step()
    optim.zero_grad()
    pbar.set_description(
        f"reward: {traj_return: 4.4f}, "
        f"last reward: {rollout[..., -1]['next', 'reward'].mean(): 4.4f}, gradient norm: {gn: 4.4}"
    )
    logs["return"].append(traj_return.item())
    logs["last_reward"].append(rollout[..., -1]["next", "reward"].mean().item())
    scheduler.step()

def plot():
    import matplotlib
    from matplotlib import pyplot as plt

    is_ipython = "inline" in matplotlib.get_backend()
    if is_ipython:
        from IPython import display

    with plt.ion():
        plt.figure(figsize=(10, 5))
        plt.subplot(1, 2, 1)
        plt.plot(logs["return"])
        plt.title("returns")
        plt.xlabel("iteration")
        plt.subplot(1, 2, 2)
        plt.plot(logs["last_reward"])
        plt.title("last reward")
        plt.xlabel("iteration")
        if is_ipython:
            display.display(plt.gcf())
            display.clear_output(wait=True)
        plt.show(block=True)

plot()

Sample Video

Coming Soon

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
images		images
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Last Survivors Reinforcement Learning

Installation

Usage

Environment

Training Policies

Sample Video

About

Releases

Packages

Languages

Jonathanace/Last-Survivors-RL

Folders and files

Latest commit

History

Repository files navigation

Last Survivors Reinforcement Learning

Installation

Usage

Environment

Training Policies

Sample Video

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages