Session | ||
MA1 - AI6: Bandit and experiment
| ||
Presentations | ||
Short-lived high-volume bandits 1Cornell University; 2Glance; 3Carnegie Mellon University TBD Markovian interference in experiments 1MIT; 2Carnegie Mellon University TBD Diffusion limits of multi-armed bandit experiments under optimism-based policies Columbia Business School, United States of America Our work provides new results on the arm-sampling behavior of the celebrated UCB family of multi-armed bandit algorithms, leading to several important insights. Among these, it is shown that arm-sampling rates under UCB are asymptotically deterministic, regardless of the problem complexity. This discovery facilitates new sharp asymptotic characterizations revealing profound distinctions between UCB and Thompson Sampling such as an "incomplete learning" phenomenon characteristic of the latter. |