ROIDICE: Offline Return on Investment Maximization for Efficient Decision Making

  • Woosung Kim
  • , Hayeong Lee
  • , Jongmin Lee*
  • , Byung Jun Lee*
  • *Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

Abstract

In this paper, we propose a novel policy optimization framework that maximizes Return on Investment (ROI) of a policy using a fixed dataset within a Markov Decision Process (MDP) equipped with a cost function. ROI, defined as the ratio between the return and the accumulated cost of a policy, serves as a measure of the efficiency of the policy. Despite the importance of maximizing ROI in various applications, it remains a challenging problem due to its nature as a ratio of two long-term values: return and accumulated cost. To address this, we formulate the ROI maximizing reinforcement learning problem as linear fractional programming. We then incorporate the stationary distribution correction (DICE) framework to develop a practical offline ROI maximization algorithm. Our proposed algorithm, ROIDICE, yields an efficient policy that offers a superior trade-off between return and accumulated cost compared to policies trained using existing frameworks.

Original languageEnglish
JournalAdvances in Neural Information Processing Systems
Volume37
Publication statusPublished - 2024
Event38th Conference on Neural Information Processing Systems, NeurIPS 2024 - Vancouver, Canada
Duration: 2024 Dec 92024 Dec 15

Bibliographical note

Publisher Copyright:
© 2024 Neural information processing systems foundation. All rights reserved.

ASJC Scopus subject areas

  • Signal Processing
  • Information Systems
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'ROIDICE: Offline Return on Investment Maximization for Efficient Decision Making'. Together they form a unique fingerprint.

Cite this