You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since, RL aims to solve MDPs i.e., Markov Decision Processes so our first aim should be decide on their representation. It should be designed in such a way that RL algorithms can easily use these representations for finding optimal/sub-optimal solutions.
MDPs have the following elements,
State
Actions
Transition Probabilities
Transition Rewards
Policy
Performance Metric
SMDPs have an additional element called Time of Transition.
How each of the above elements can be represented? One idea can be to use a class for encapsulating the above elements.
Example of the problem
References/Other comments
The text was updated successfully, but these errors were encountered:
Description of the problem
Since, RL aims to solve MDPs i.e., Markov Decision Processes so our first aim should be decide on their representation. It should be designed in such a way that RL algorithms can easily use these representations for finding optimal/sub-optimal solutions.
MDPs have the following elements,
SMDPs have an additional element called Time of Transition.
How each of the above elements can be represented? One idea can be to use a class for encapsulating the above elements.
Example of the problem
References/Other comments
The text was updated successfully, but these errors were encountered: