Skip to content

Latest commit

 

History

History
22 lines (14 loc) · 629 Bytes

README.md

File metadata and controls

22 lines (14 loc) · 629 Bytes

Sequential learning

This repo contains resources to test a few online convex optimization / best arm identification in bandits algorithms.

Online convex optimization

  • Online gradient descent
  • Online gradient descend without gradient

Best arm identification in stochastic bandits

Fixed budget

  • Uniform sampling
  • Successive rejects

Fixed confidence

  • UCB-based with heuristic approximation of the GLRT stopping rule
  • Uniform sampling with heuristic approximation of the GLRT stopping rule
  • TTUCB (Top two with arm drawn by UCB as the leader)
  • EB-TC (Top two with empirically best arm as the leader)