USAK METHOD FOR THE REINFORCEMENT LEARNING

Authors

DOI:

https://doi.org/10.20535/2708-4930.1.2020.216042

Keywords:

reinforcement learning, Kanerva coding, function approximation, prototype, value function

Abstract

In the field of reinforcement learning, tabular methods have become widespread. There are many important scientific results, which significantly improve their performance in specific applications. However, the application of tabular methods is limited due to the large amount of resources required to store value functions in tabular form under high-dimensional state spaces. A natural solution to the memory problem is to use parameterized function approximations. However, conventional approaches to function approximations, in most cases, have ceased to give the desired result of memory reduction in solving real-world problems. This fact became the basis for the application of new approaches, one of which is the use of Sparse Distributed Memory (SDM) based on Kanerva coding. A further development of this direction was the method of Similarity-Aware Kanerva (SAK). In this paper, a modification of the SAK method is proposed, the Uniform Similarity-Aware Kanerva (USAK) method, which is based on the uniform distribution of prototypes in the state space. This approach has reduced the use of RAM required to store prototypes. In addition, reducing the receptive distance of each of the prototypes made it possible to increase the learning speed by reducing the number of calculations in the linear approximator.

Author Biographies

Mykhailo Novotarskyi

• Doctor of Technical Sciences, Senior Researcher
• Department of Computer Engineering
• National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"

Valentin Kuzmich

• Master of Science
• Department of Computer Engineering
• PhD-student at National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"

References

The Stanford University (2020), “CS234: Reinforcement Learning”, available at: https://web.stanford.edu/class/cs234/

The University of Edinburg (2020), “Reinforcement Learning”, available at: http://www.inf.ed.ac.uk/teaching/courses/rl/

The University of Alberta (2020), “Fundamentals of Reinforcement Learning”, available at: https://www.classcentral.com/course/fundamentals-of-reinforcement-learning-14497

Silver D. (2020), “UCL Course on RL”, University College London, available at: https://www.davidsilver.uk/teaching/

Sutton R. S. and Barto A.G. (2018), “Reinforcement Learning: An Introduction”, Cambridge: The MIT Press, , available at: http://www.academia.edu/download/38529120/9780262257053_index.pdf

Slivkins A. (2019), “Introduction to Multi-Armed Bandits”, available at: https://arxiv.org/abs/1904.07272v5

Levin D.A. and Peres Y. (2017), “Markov chains and mixing times”, available at: https://www.statslab.cam.ac.uk/~beresty/teach/Mixing/markovmixing.pdf.

Wiering M. fand van Otello M. (2012), “Reinforcement Learning”, Berlin: Springer-Verlag.

Hester T. (2013), “TEXPLORE: Temporal Difference Reinforcement Learning for Robots and Time-Constrained Domains”, Berlin: Springer-Verlag.

Whiteson Sh. and Stone P. (2006), “Evolutionary Function Approximation for Reinforcement Learning“, Journal of Machine Learning Research, Vol. 7, pp. 877-917

Kanerva P. (2003), “Sparse Distributed Memory”, Cambridge: MIT Press.

Cheng Wu and Yiming Wang (2017), “Learning From Big Data: A Survey and Evaluation of Approximation Technologies for Large-Scale Reinforcement Learning”, IEEE, Computer and Information Technology (CIT), International Conference,-DOI: 10.1109/CIT.2017.11

Wei Li (2019), “Function Approximation-based Reinforcement Learning for Large-Scale Problem Domains”, PhD dissertation, Northeastern University, Boston, Massachusetts..

Wei Li and Meleis W. (2018) “Similarity-Aware Kanerva Coding for On-Line Reinforcement Learning”, Proceedings of the 2-nd International Conference on Vision, Image and Signal Processing.

Wei Li and Meleis W. (2018), “Adaptive Adjacency Kanerva Coding for Memory-Constrained Reinforcement Learning“, International Conference on Machine Learning and Data Mining in Pattern Recognition, pp.187-201.

Sherstov A. A. and Stone P. (2005),” Function Approximation via Tile Coding: Automating Parameter Choice”, International Symposium on Abstraction, Reformulation, and Approximation, pp.194-205.

Waskow S. J. and Bazzan A.L.C. (2010), “Improving Space Representation in Multiagent Learning via Tile Coding”, Brazilian Symposium on Artificial Intelligence, pp.153-162.

Cheng Wu (2010), “Novel Function Approximation Techniques for Large-scale Reinforcement Learning”, PhD dissertation, Northeastern University, Boston, Massachusetts.

Karpathy A. (2020), “REINFORCEjs. WaterWorld: DQN”, available at: https://cs.stanford.edu/people/karpathy/reinforcejs/waterworld.html

Downloads

Published

2020-10-01

How to Cite

[1]
M. Novotarskyi and V. Kuzmich, “USAK METHOD FOR THE REINFORCEMENT LEARNING”, Inf. Comput. and Intell. syst. j., no. 1, Oct. 2020.