This repo contains resources to test a few online convex optimization / best arm identification in bandits algorithms.
- Online gradient descent
- Online gradient descend without gradient
- Uniform sampling
- Successive rejects
- UCB-based with heuristic approximation of the GLRT stopping rule
- Uniform sampling with heuristic approximation of the GLRT stopping rule
- TTUCB (Top two with arm drawn by UCB as the leader)
- EB-TC (Top two with empirically best arm as the leader)