Data-Based Optimal Switching and Control With Admissibility Guaranteed Q-Learning

Zhengrong Xiang*, Pingchuan Li, Wencheng Zou, Choon Ki Ahn*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)

Abstract

This article addresses the data-based optimal switching and control codesign for discrete-time nonlinear switched systems via a two-stage approximate dynamic programming (ADP) algorithm. Through offline policy improvement and policy evaluation, the proposed algorithm iteratively determines the optimal hybrid control policy using system input/output data. Moreover, a strict proof of the convergence is given for the two-stage ADP algorithm. Admissibility, an essential property of the hybrid control policy must be ensured for practical application. To this end, the properties of the hybrid control policies are analyzed and an admissibility criterion is obtained. To realize the proposed Q-learning algorithm, an actor-critic neural network (NN) structure that employs multiple NNs to approximate the Q-functions and control policies for different subsystems is adopted. By applying the proposed admissibility criterion, the obtained hybrid control policy is guaranteed to be admissible. Finally, two numerical simulations verify the effectiveness of the proposed algorithm.

Original languageEnglish
Pages (from-to)5963-5973
Number of pages11
JournalIEEE Transactions on Neural Networks and Learning Systems
Volume36
Issue number4
DOIs
Publication statusPublished - 2025

Bibliographical note

Publisher Copyright:
© 2012 IEEE.

Keywords

  • Approximate dynamic programming (ADP)
  • Q-learning
  • data-based control
  • neural networks (NNs)
  • optimal control
  • switched system
  • value iteration (VI)

ASJC Scopus subject areas

  • Software
  • Computer Science Applications
  • Computer Networks and Communications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Data-Based Optimal Switching and Control With Admissibility Guaranteed Q-Learning'. Together they form a unique fingerprint.

Cite this