Publications > Reducing Dueling Bandits to Cardinal Bandits

Reducing Dueling Bandits to Cardinal Bandits

Publication

Jun 22, 2014

Abstract

We present algorithms for reducing the Dueling Bandits problem to the conventional (stochastic) Multi-Armed Bandits problem. The Dueling Bandits problem is an online model of learning with ordinal feedback of the form "A is preferred to B'' (as opposed to cardinal feedback like "A has value 2.5''), giving it wide applicability in learning from implicit user feedback and revealed and stated preferences. In contrast to existing algorithms for the Dueling Bandits problem, our reductions provide a generic schema for translating the extensive body of known results about conventional Multi-Armed Bandit algorithms to the Dueling Bandits setting. In addition we are the first to provide an almost optimal regret bound in terms of the second order terms such as the differences between the values of the arms. We present three such algorithms: Doubler, MultiSbm and Sparring. For Doubler and MultiSbm we prove regret upper bounds in both finite and infinite settings, and conjecture about the performance of Sparring which empirically outperforms the other two as well as previous algorithms in our experiments.

Download

Venue:

icml 2014

Type:

Conference/Workshop Paper

Authors:

Nir Ailon

BibTeX

@inproceedings{ author = {Nir Ailon}, title = {Reducing Dueling Bandits to Cardinal Bandits}, booktitle = {Proceedings of icml 2014}, year = {2014} }

- Help
- About our ads

Reducing Dueling Bandits to Cardinal Bandits

Publication

Abstract

icml 2014

Conference/Workshop Paper

Nir Ailon

BibTeX