Modern Reinforcement Learning

I try to explain modern Reinforcement Learning (RL) concepts in the era of Large Language Models (LLMs) with clean code illustrated on simple examples.

Note:

The code is not optimized for performance, but for clarity.
The examples are not meant to be practical but rather to illustrate the concepts.

For the algorithms, I use a simple decoder-only Transformer as the policy model and a simple environment (FrozenLake) as the task. The files could in principle be quite easily adjusted for LLM reasoning tasks. i.e. to acquire reasoning capabilities.

Algorithms

PPO (Proximal Policy Optimization)

The PPO algorithm is implemented in the ppo.py file.

GRPO (Group Relative Policy Optimization)

To be implemented.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
policy_model.py		policy_model.py
ppo.py		ppo.py
pyproject.toml		pyproject.toml
vis.py		vis.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Modern Reinforcement Learning

Algorithms

PPO (Proximal Policy Optimization)

GRPO (Group Relative Policy Optimization)

About

Releases

Packages

Contributors 2

Languages

mertensu/modern-RL

Folders and files

Latest commit

History

Repository files navigation

Modern Reinforcement Learning

Algorithms

PPO (Proximal Policy Optimization)

GRPO (Group Relative Policy Optimization)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages