A final project of 5SC28-Machine Learning for System and Control 2021/2022 course at TU Eindhoven. The project is all about the unbalanced disk modelling and controlling it so that it can swing up and also having a
The modelling here are using the NARX model structure, where it is implemented in both the Gaussian Process and Artificial Neural Network using scikit-learn
and PyTorch
respectively. For the Gaussian Process, we managed to use the exact method of inference, thus it may take several hours to train the Gaussian Process. For the grid search Gaussian Process implementation, it provides a good approximations as well.
The overall objective is to control an unbalanced disk to swing-up or making a swing-up policy (see Gym Unbalacned Disk library by Gerben Beintema). There are several methods that we have done, which are:
-
DQN (Deep Q-Network) with
stable-baselines3
-
A2C (Advantage Actor Critic) with
PyTorch
. Use the a2c_image.ipynb and a2c_eval.py for generating figures and evaluate the model respectively. -
SAC (Soft Actor-Critic) with
stable-baselines3
- Classical Q-Learning (Tabular Q-learning)
-
Multi_SAC for multi-target policy
$\pm10^{\circ}$ using the SAC method withstable-baselines3