Abstract
We introduce a simple and effective regularization of knowledge gradient (KG) and use it to present the first sublinear regret bound result for KG-based algorithms. We construct online learning with regularized knowledge gradients (ORKG) algorithm with independent Gaussian belief model, and prove that ORKG algorithm achieves sublinear regret upper bound with high probability facing bounded independent Gaussian multi-armed bandit (MAB) problems. The theoretical properties of regularized KG and ORKG algorithm are analyzed, and the empirical characteristics of ORKG algorithm are empirically validated with MAB benchmark simulations. ORKG algorithm shows top-tier performance comparable to select MAB algorithms with provable regret bounds.
Original language | English |
---|---|
Title of host publication | Advances in Knowledge Discovery and Data Mining - 26th Pacific-Asia Conference, PAKDD 2022, Proceedings |
Editors | João Gama, Tianrui Li, Yang Yu, Enhong Chen, Yu Zheng, Fei Teng |
Publisher | Springer Science and Business Media Deutschland GmbH |
Pages | 328-339 |
Number of pages | 12 |
ISBN (Print) | 9783031059353 |
DOIs | |
Publication status | Published - 2022 |
Event | 26th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2022 - Chengdu, China Duration: 2022 May 16 → 2022 May 19 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 13281 LNAI |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 26th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2022 |
---|---|
Country/Territory | China |
City | Chengdu |
Period | 22/5/16 → 22/5/19 |
Bibliographical note
Publisher Copyright:© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
Keywords
- Knowledge gradient
- Online learning
- Regret analysis
ASJC Scopus subject areas
- Theoretical Computer Science
- General Computer Science