(Update 2021.12.13) Source code is open now.
(Update 2021.01.11) More posts are available here.
The (introductory) notes include Bandit Algorithms, MDP, Model-free Methods, Value Function Approximation, Policy Optimization. For the state-of-the-art advances, one can refer to paper directly and some excellent blogs.
Hope you enjoy your learning.