Skip to content

mertensu/modern-RL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Modern Reinforcement Learning

I try to explain modern Reinforcement Learning (RL) concepts in the era of Large Language Models (LLMs) with clean code illustrated on simple examples.

Note:

  1. The code is not optimized for performance, but for clarity.
  2. The examples are not meant to be practical but rather to illustrate the concepts.

For the algorithms, I use a simple decoder-only Transformer as the policy model and a simple environment (FrozenLake) as the task. The files could in principle be quite easily adjusted for LLM reasoning tasks. i.e. to acquire reasoning capabilities.

Algorithms

PPO (Proximal Policy Optimization)

The PPO algorithm is implemented in the ppo.py file.

GRPO (Group Relative Policy Optimization)

To be implemented.

About

PPO and GRPO

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages