I tried to make this project as generic as possible to avoid repeating code. and to make it reusable in different robotic projects . I've implmented 4 main classes that I'll discuss briefly.
you can find the actual algorith implmentation here , I used only Numsharp to implement the algortihm.
a simple wrapper class that uses articulation bodies to emulate the simple functions of Servos in embbeded systems.
this is an abstract class that Qlearning interacts with , so it's Independent on user's environment implementation.
here you can find the actual environment implmentation (Actions , states , Functions , Qtable etc.)
SimulationTest.mp4
I'm glad to say that using this approach to the project has saved me countless hours of training time to reach the optimal values for the parameters.
I first started with default paramters for the Qlearn , and tested the whole system.
Whole.System.Test.mp4
then after training 2 full episdoes ( about 1000 steps tried) i found out that the robot had and idea of what the optimal action was but still explored better ones . so i repeated the test , this time training 10 models at once with different paramters
Randomparamters.testing.mp4
finally i settled with the best 3 paramters set values , and made them compete for the best , these were the rsults after 30 min of training
race.results.mp4
Winner.mp4
I began this project to be used only as a base to understand the algorithm in depth and to cut training time and paramters tuning , but ended up actually using the exact same code , down to the delay values ! while I only tweaked little physics paramters in the settings , the simulation yielded a pretty accurate results, this can be seen in the comparison between the C and C# implemntation
Screencast.from.07-21-2022.09.45.05.AM.mp4
We used 2 MG-995 servo motors of Stall torque ranging from 9.4kg/cm (4.8v) to 11kg/cm (6v). Operating voltage range: 4.8 V to 7.2 V.
The MG-995 cycle is 20 ms. The 0° correspond to 0.5 ms on time and the 180° correspond to 2.5ms. Timer 1 was used to control the 2 motors with OCR1A controlling motor 1 on time and OCR1B controlling motor 2 on time.
A rotary encoder was used to detect motion and it’s direction. The encoder values (1,0) and direction (1,-1) are used to feedback the reward to the Qlearn algorithm.
External interrupt INT0 is used to detect the reading of the rotary encoder.
The robot have 2 powering sources. 1)Two AAA batteries of 6v and 500 mA to power the encoder and the AVR microcontroller.
2)We used a 5000mA lithium bulk battery of volt range up to 15v to power the 2 servo motors