tictactoe-RL commit policy is trained with 100,000 matches(can be edtied in the main fucntion) just run main.py