Demand and capacity balancing technology based on multi-agent reinforcement learning

Chen, YutongXu, YanHu, MinghuaYang, Lei2022-01-182022-01-182021-11-15Chen Y, Xu Y, Hu M, Yang L (2021) Demand and capacity balancing technology based on multi-agent reinforcement learning. In: 2021 AIAA/IEEE 40th Digital Avionics Systems Conference (DASC), 3-7 October 2021, San Antonio, USA2155-7209https://doi.org/10.1109/DASC52595.2021.9594343https://dspace.lib.cranfield.ac.uk/handle/1826/17428To effectively solve Demand and Capacity Balancing (DCB) in large-scale and high-density scenarios through the Ground Delay Program (GDP) in the pre-tactical stage, a sequential decision-making framework based on a time window is proposed. On this basis, the problem is transformed into Markov Decision Process (MDP) based on local observation, and then Multi-Agent Reinforcement Learning (MARL) method is adopted. Each flight is regarded as an independent agent to decide whether to implement GDP according to its local state observation. By designing the reward function in multiple combinations, a Mixed Competition and Cooperation (MCC) mode considering fairness is formed among agents. To improve the efficiency of MARL, we use the double Q-Learning Network (DQN), experience replay technology, adaptive ϵ-greedy strategy and Decentralized Training with Decentralized Execution (DTDE) framework. The experimental results show that the training process of the MARL method is convergent, efficient and stable. Compared with the Computer-Assisted Slot Allocation (CASA) method used in the actual operation, the number of flight delays and the average delay time is reduced by 33.7% and 36.7% respectively.enAttribution-NonCommercial 4.0 Internationalhttp://creativecommons.org/licenses/by-nc/4.0/demand and capacity balancingground delay programmulti-agent reinforcement learningdouble Q-learning networkexperience replayadaptive ϵ-greedy strategydecentralized training with decentralized executionDemand and capacity balancing technology based on multi-agent reinforcement learningConference paper978-1-6654-3420-1