We propose a method for learning multi-agent policies to compete against multiple opponents. The method consists of recurrent neural network-based actor-critic networks and deterministic policy gradients that promote cooperation between agents by communication. The learning process does not require access to opponents' parameters or observations because the agents are trained separately from the opponents. The actor networks enable the agents to communicate using forward and backward paths while the critic network helps to train the actors by delivering them gradient signals based on their contribution to the global reward. Moreover, to address nonstationarity due to the evolving of other agents, we propose approximate model learning using auxiliary prediction networks for modeling the state transitions, reward function, and opponent behavior. In the test phase, we use competitive multi-agent environments to demonstrate by comparison the usefulness and superiority of the proposed method in terms of learning efficiency and goal achievements. The comparison results show that the proposed method outperforms the alternatives.
Bibliographical noteFunding Information:
This research was supported by the Brain Korea PLUS Basic Science Research Program through the National Research Foundation of Korea funded by the Ministry of Science, ICT and Future Planning (NRF-2016R1A2B1008994 to SBK), the Ministry of Trade, Industry and Energy under the Industrial Technology Innovation Program (R1623371 to SBK), the Korea Creative Content Agency, Culture Technology(CT) Research and Development Program 2019, and an Institute for Information and Communications Technology Promotion grant funded by the Korean government (No 2018-0-00440 to SBK, ICT-based Crime Risk Prediction and Response Platform Development for Early Awareness of Risk Situations)
© 2019 Park et al.
ASJC Scopus subject areas