USAK METHOD FOR THE REINFORCEMENT LEARNING
DOI:
https://doi.org/10.20535/2708-4930.1.2020.216042Keywords:
reinforcement learning, Kanerva coding, function approximation, prototype, value functionAbstract
In the field of reinforcement learning, tabular methods have become widespread. There are many important scientific results, which significantly improve their performance in specific applications. However, the application of tabular methods is limited due to the large amount of resources required to store value functions in tabular form under high-dimensional state spaces. A natural solution to the memory problem is to use parameterized function approximations. However, conventional approaches to function approximations, in most cases, have ceased to give the desired result of memory reduction in solving real-world problems. This fact became the basis for the application of new approaches, one of which is the use of Sparse Distributed Memory (SDM) based on Kanerva coding. A further development of this direction was the method of Similarity-Aware Kanerva (SAK). In this paper, a modification of the SAK method is proposed, the Uniform Similarity-Aware Kanerva (USAK) method, which is based on the uniform distribution of prototypes in the state space. This approach has reduced the use of RAM required to store prototypes. In addition, reducing the receptive distance of each of the prototypes made it possible to increase the learning speed by reducing the number of calculations in the linear approximator.
References
The Stanford University (2020), “CS234: Reinforcement Learning”, available at: https://web.stanford.edu/class/cs234/
The University of Edinburg (2020), “Reinforcement Learning”, available at: http://www.inf.ed.ac.uk/teaching/courses/rl/
The University of Alberta (2020), “Fundamentals of Reinforcement Learning”, available at: https://www.classcentral.com/course/fundamentals-of-reinforcement-learning-14497
Silver D. (2020), “UCL Course on RL”, University College London, available at: https://www.davidsilver.uk/teaching/
Sutton R. S. and Barto A.G. (2018), “Reinforcement Learning: An Introduction”, Cambridge: The MIT Press, , available at: http://www.academia.edu/download/38529120/9780262257053_index.pdf
Slivkins A. (2019), “Introduction to Multi-Armed Bandits”, available at: https://arxiv.org/abs/1904.07272v5
Levin D.A. and Peres Y. (2017), “Markov chains and mixing times”, available at: https://www.statslab.cam.ac.uk/~beresty/teach/Mixing/markovmixing.pdf.
Wiering M. fand van Otello M. (2012), “Reinforcement Learning”, Berlin: Springer-Verlag.
Hester T. (2013), “TEXPLORE: Temporal Difference Reinforcement Learning for Robots and Time-Constrained Domains”, Berlin: Springer-Verlag.
Whiteson Sh. and Stone P. (2006), “Evolutionary Function Approximation for Reinforcement Learning“, Journal of Machine Learning Research, Vol. 7, pp. 877-917
Kanerva P. (2003), “Sparse Distributed Memory”, Cambridge: MIT Press.
Cheng Wu and Yiming Wang (2017), “Learning From Big Data: A Survey and Evaluation of Approximation Technologies for Large-Scale Reinforcement Learning”, IEEE, Computer and Information Technology (CIT), International Conference,-DOI: 10.1109/CIT.2017.11
Wei Li (2019), “Function Approximation-based Reinforcement Learning for Large-Scale Problem Domains”, PhD dissertation, Northeastern University, Boston, Massachusetts..
Wei Li and Meleis W. (2018) “Similarity-Aware Kanerva Coding for On-Line Reinforcement Learning”, Proceedings of the 2-nd International Conference on Vision, Image and Signal Processing.
Wei Li and Meleis W. (2018), “Adaptive Adjacency Kanerva Coding for Memory-Constrained Reinforcement Learning“, International Conference on Machine Learning and Data Mining in Pattern Recognition, pp.187-201.
Sherstov A. A. and Stone P. (2005),” Function Approximation via Tile Coding: Automating Parameter Choice”, International Symposium on Abstraction, Reformulation, and Approximation, pp.194-205.
Waskow S. J. and Bazzan A.L.C. (2010), “Improving Space Representation in Multiagent Learning via Tile Coding”, Brazilian Symposium on Artificial Intelligence, pp.153-162.
Cheng Wu (2010), “Novel Function Approximation Techniques for Large-scale Reinforcement Learning”, PhD dissertation, Northeastern University, Boston, Massachusetts.
Karpathy A. (2020), “REINFORCEjs. WaterWorld: DQN”, available at: https://cs.stanford.edu/people/karpathy/reinforcejs/waterworld.html