Deep Q-learning policy optimization method for enhancing generalization in autonomous vehicle control

Authors

DOI:

https://doi.org/10.20535/2786-8729.7.2025.341723

Keywords:

deep Q-learning, autonomous vehicle, policy generalization, reward function, dynamic initial conditions, cyber-physical systems

Abstract

The development of autonomous vehicle control policies based on deep reinforcement learning is a principal technical problem for cyber-physical systems, fundamentally constrained by the high dimensionality of state spaces, inherent algorithmic instability, and a pervasive risk of policy over-specialization that severely limits generalization to real-world scenarios. The object of this investigation is the iterative process of forming a robust control policy within a simulated environment, while the subject focuses on the influence of specialized reward structures and initial training conditions on policy convergence and generalization capability. The study's aim is to develop and empirically evaluate a deep Q-learning policy optimization method that utilizes dynamic initial conditions to mitigate over-specialization and achieve stable, globally optimal adaptive control. The developed method formalizes two optimization criteria. First, the adaptive reward function serves as the safety and convergence criterion, defined hierarchically with major penalties for collision, intermediate incentives for passing checkpoints and a continuous minor penalty for elapsed time to drive efficiency. Second, the mechanism of dynamic initial conditions acts as the policy generalization criterion, designed to inject necessary stochasticity into the state distribution. The agent is modeled as a vehicle equipped with an eight-sensor system providing 360 degrees coverage, making decisions from a discrete action space of seven options. Its ten-dimensional state vector integrates normalized sensor distance readings with normalized dynamic characteristics, including speed and angular error. Empirical testing confirmed the policy's vulnerability under baseline fixed-start conditions, where the agent demonstrated over-specialization and stagnated at a traveled distance of approximately 960 conventional units after 40,000 episodes. The subsequent application of the dynamic initial conditions criterion successfully addressed this failure. By forcing the agent to rely on its generalized state mapping instead of trajectory memory, this approach successfully overcame the learning plateau, enabling the agent to achieve full, collision-free track traversal between 53,000 and 54,000 episodes. Final optimization, driven by penalty, reduced the total track completion time by nearly half. This verification confirms the method's value in producing robust, stable, and efficient control policies suitable for integration into autonomous transport cyber-physical systems.

Author Biographies

Andrii Pysarenko, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”

Associated Professor of the Department of Information Systems and Technologies of the Faculty of informatics and Computer Technique, Candidate of Science (Mathematics), Associate Professor

Mykhailo Drahan, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”

PhD student of the Department of Information Systems and Technologies of the Faculty of informatics and Computer Technique

References

E. Figetakis, Y. Bello, A. Refaey, A. Shami, ‘Decentralized semantic traffic control in AVs using RL and DQN for dynamic roadblocks’, 2024. https://doi.org/10.48550/arXiv.2406.18741

B. R. Kiran, I. Sobh, V. Talpaert, P. Mannion, A. A. A. Sallab, S. Yogamani, P. Pérez, ‘Deep reinforcement learning for autonomous driving: A survey’, IEEE Trans. Intell. Transp. Syst., vol. 23, no. 6, pp. 4909–4926, June 2022. https://doi.org/10.1109/TITS.2021.3054625

Yosyp Albrekht, Andrii Pysarenko, ‘Exploring the power of heterogeneous UAV swarms through reinforcement learning’, Technology audit and production reserves, vol. 6, no. 2(74), pp. 6–10, 2023. https://doi.org/10.15587/2706-5448.2023.293063

S. Ibrahim, M. Mostafa, A. Jnadi, H. Salloum, P. Osinenko, ‘Comprehensive overview of reward engineering and shaping in advancing Reinforcement Learning applications’, 2024. https://doi.org/10.48550/arXiv.2408.10215

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis, ‘Human-level control through deep reinforcement learning’, Nature, vol. 518, no. 7540, pp. 529–533, Feb. 2015. https://doi.org/10.1038/nature14236

J. Escobar-Naranjo, G. Caiza, P. Ayala, E. Jordan, C. A. Garcia, M. V. Garcia, ‘Autonomous navigation of robots: Optimization with DQN’, Appl. Sci. (Basel), vol. 13, no. 12, p. 7202, June 2023. https://doi.org/10.3390/app13127202

M. A. Alohali, H. Alqahtani, A. Darem, M. Abdullah, Y. Nam, M. Abouhawwash, ‘Integrating cyber-physical systems with embedding technology for controlling autonomous vehicle driving’, PeerJ Comput. Sci., vol. 11, June 2025. https://doi.org/10.7717/peerj-cs.2823

P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup, D. Meger, ‘Deep Reinforcement Learning that Matters’, 2017. [Online]. Available: https://doi.org/10.48550/arXiv.1709.06560

S. Zhang, R. S. Sutton, ‘A deeper look at experience replay’, 2017. https://doi.org/10.48550/arXiv.1712.01275

T. Pohlen, B. Piot, T. Hester, M. G. Azar, D. Horgan, D. Budden, G. Barth-Maron, H. Hasselt, J. Quan, M. Večerík, M. Hessel, R. Munos, O. Pietquin ‘Observe and look further: Achieving consistent performance on Atari’, 2018. https://doi.org/10.48550/arXiv.1805.11593

A. Khlifi, M. Othmani, M. Kherallah, ‘A novel approach to autonomous driving using Double Deep Q-Network-bsed deep reinforcement learning’, World Electric Veh. J., vol. 16, no. 3, p. 138, Mar. 2025. https://doi.org/10.3390/wevj16030138

P. Czechowski, B. Kawa, M. Sakhai, M. Wielgosz, ‘Deep reinforcement and IL for autonomous driving: A review in the CARLA simulation environment’, Appl. Sci. (Basel), vol. 15, no. 16, p. 8972, Aug. 2025. https://doi.org/10.3390/app15168972

L. Ge, X. Zhou, Y. Li, ‘Designing reward functions using active preference learning for reinforcement learning in autonomous driving navigation’, Appl. Sci. (Basel), vol. 14, no. 11, p. 4845, June 2024. https://doi.org/10.3390/app14114845

R. Audinys, Ž. Šlikas, J. Radkevičius, M. Šutas, A. Ostreika, ‘Deep reinforcement learning for a self-driving vehicle operating solely on visual information’, Electronics (Basel), vol. 14, no. 5, p. 825, Feb. 2025. https://doi.org/10.3390/electronics14050825

A. Trott, S. Zheng, C. Xiong, R. Socher, ‘Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards’, 2019. https://doi.org/10.48550/arXiv.1911.01417

Downloads

Published

2025-12-27

How to Cite

[1]
A. Pysarenko and M. Drahan, “Deep Q-learning policy optimization method for enhancing generalization in autonomous vehicle control”, Inf. Comput. and Intell. syst. j., no. 7, pp. 96–109, Dec. 2025.