TY - GEN
T1 - A Case Study
T2 - 12th International Conference on Information and Communication Technology Convergence, ICTC 2021
AU - Shakerimov, Aidar
AU - Li, Dmitriy
AU - Park, Jurn-Gyu
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - One of the serious problems in Reinforcement Learning (RL) algorithms is that their performance usually varies when the same experiment is repeated or reproduced. Although RL results are hard to reproduce due to algorithms' intrinsic variance, which was not investigated systematically. Through this case study on Flappy Bird environment, we introduce and characterize four important factors on performance inconsistency of RL algorithms: 1) level of environment randomness, 2) order of action-value updates process, 3) exploration rate strategy, and 4) selection between on- and off-policy algorithms. Using a quantitative metric (coefficient of variation), we compare, analyze and investigate the results and the effects of each factor on the performance inconsistency/variance in RL. We believe our experimental results and analysis will provide opportunities to obtain an efficient agent that repeats/reproduces more consistent performance results.
AB - One of the serious problems in Reinforcement Learning (RL) algorithms is that their performance usually varies when the same experiment is repeated or reproduced. Although RL results are hard to reproduce due to algorithms' intrinsic variance, which was not investigated systematically. Through this case study on Flappy Bird environment, we introduce and characterize four important factors on performance inconsistency of RL algorithms: 1) level of environment randomness, 2) order of action-value updates process, 3) exploration rate strategy, and 4) selection between on- and off-policy algorithms. Using a quantitative metric (coefficient of variation), we compare, analyze and investigate the results and the effects of each factor on the performance inconsistency/variance in RL. We believe our experimental results and analysis will provide opportunities to obtain an efficient agent that repeats/reproduces more consistent performance results.
KW - Performance Inconsistency
KW - Q-learning
KW - Reinforcement Learning (RL)
KW - Sarsa algorithm
KW - State Discretization
UR - https://www.scopus.com/pages/publications/85122917357
UR - https://www.scopus.com/pages/publications/85122917357#tab=citedBy
U2 - 10.1109/ICTC52510.2021.9621017
DO - 10.1109/ICTC52510.2021.9621017
M3 - Conference contribution
T3 - International Conference on ICT Convergence
SP - 611
EP - 615
BT - 2021 International Conference on Information and Communication Technology Convergence (ICTC)
PB - IEEE Computer Society
Y2 - 20 October 2021 through 22 October 2021
ER -