DPOC-Project

Programming Exercise part of the course Dynamic Programming and Optimal Control, ETH Zurich, academic year 2019/2020. Read ProgrammingExercise.pdf for problem and scripts description. The aim of this programming exercise is to solve a stochastic shortest path problem using Value Iteration, Policy Iteration and Linear Programming. The scripts coded by the student are:

ComputeTerminalStateIndex.m
ComputeTransitionProbabilities.m
ComputeStageCosts.m
PolicyIteration.m
ValueIteration.m
LinearProgramming.m

Run main.m for checking the solution obtained.

RL extension

I have extended the programming exercise by solving the same stochastic shortest path problem using Reinforcement Learning (RL) algorithms:

SARSA w and w/o initialization from expert
Q-Learning w and w/o initialization from expert
Double-Q-Learning w and w/o initialization from expert

Where the initialization is guided by the expert, some trajectories are sampled using the optimal policy obtained through Dynamic Programming, and the Q-values of the state-action pairs visited are initialized at a higher value. The functions that implement SARSA, Q-Learning and Double-Q-Learning can be found in scripts SARSA.m, Q_Learning.m and Double_Q_Learning.m, respectively. In these, an epsilon-greedy policy is used for exploration. SARSA and Q-Learning are implemented also using Upper Confidence Bounds (UCB) for exploration (in SARSA_UCB.m and Q_Learning_UCB.m, respectively).

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
src		src
ProgrammingExercise.pdf		ProgrammingExercise.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DPOC-Project

RL extension

About

Releases

Packages

Languages

agiammarino96/DPOC-Project

Folders and files

Latest commit

History

Repository files navigation

DPOC-Project

RL extension

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages