Reinforcement learning for control with multiple frequencies

Jongmin Lee, Byung Jun Lee, Kee Eung Kim

Research output: Contribution to journalConference articlepeer-review

6 Citations (Scopus)


Many real-world sequential decision problems involve multiple action variables whose control frequencies are different, such that actions take their effects at different periods. While these problems can be formulated with the notion of multiple action persistences in factored-action MDP (FA-MDP), it is non-trivial to solve them efficiently since an action-persistent policy constructed from a stationary policy can be arbitrarily suboptimal, rendering solution methods for the standard FA-MDPs hardly applicable. In this paper, we formalize the problem of multiple control frequencies in RL and provide its efficient solution method. Our proposed method, Action-Persistent Policy Iteration (AP-PI), provides a theoretical guarantee on the convergence to an optimal solution while incurring only a factor of |A| increase in time complexity during policy improvement step, compared to the standard policy iteration for FA-MDPs. Extending this result, we present Action-Persistent Actor-Critic (AP-AC), a scalable RL algorithm for high-dimensional control tasks. In the experiments, we demonstrate that AP-AC significantly outperforms the baselines on several continuous control tasks and a traffic control simulation, which highlights the effectiveness of our method that directly optimizes the periodic non-stationary policy for tasks with multiple control frequencies.

Original languageEnglish
JournalAdvances in Neural Information Processing Systems
Publication statusPublished - 2020
Externally publishedYes
Event34th Conference on Neural Information Processing Systems, NeurIPS 2020 - Virtual, Online
Duration: 2020 Dec 62020 Dec 12

Bibliographical note

Funding Information:
This work was supported by the National Research Foundation (NRF) of Korea (NRF-2019R1A2C1087634 and NRF-2019M3F2A1072238), the Ministry of Science and Information communication Technology (MSIT) of Korea (IITP No. 2020-0-00940, IITP No. 2019-0-00075, IITP No. 2017-0-01779 XAI), and POSCO.

Publisher Copyright:
© 2020 Neural information processing systems foundation. All rights reserved.

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing


Dive into the research topics of 'Reinforcement learning for control with multiple frequencies'. Together they form a unique fingerprint.

Cite this