TY - GEN
T1 - Uniformly Distributed Data Effects in Offline RL
T2 - 2024 IEEE International Conference on Big Data and Smart Computing, BigComp 2024
AU - Tokayev, Kuanysh
AU - Park, Jurn Gyu
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - In the emerging landscape of off-policy reinforce-ment learning (RL), challenges arise due to the significant costs and risks tied to data collection. To address these issues, there is an alternative path for transitioning from off-policy to offline RL, known for its fixed data collection practices. This stands in contrast to online algorithms, which are sensitive to changes in data during the learning phase. However, the inherent challenge of offline RL lies in its limited interaction with the environment, resulting in inadequate data coverage. Hence, we underscore the convenient application of offline RL, 1) starting from the collection of a static dataset, 2) followed by the training of offline RL models, and 3) culminating with testing in the same environment as off-policy RL methodologies. This involves the utilization of a uniform dataset gathered systematically via non-arbitrary action selection, covering all possible states of the environment. Utilizing the proposed approach, the Offline RL model employing a Multi-Layer Perceptron (MLP) achieves a testing accuracy that falls within 1 % of the results obtained by the off-policy RL agent. Moreover, we provide a practical guide with datasets, offering valuable tutorials on the application of Offline RL in Gridworld-based real-world applications. The guide can be found in this GitHub1repository.
AB - In the emerging landscape of off-policy reinforce-ment learning (RL), challenges arise due to the significant costs and risks tied to data collection. To address these issues, there is an alternative path for transitioning from off-policy to offline RL, known for its fixed data collection practices. This stands in contrast to online algorithms, which are sensitive to changes in data during the learning phase. However, the inherent challenge of offline RL lies in its limited interaction with the environment, resulting in inadequate data coverage. Hence, we underscore the convenient application of offline RL, 1) starting from the collection of a static dataset, 2) followed by the training of offline RL models, and 3) culminating with testing in the same environment as off-policy RL methodologies. This involves the utilization of a uniform dataset gathered systematically via non-arbitrary action selection, covering all possible states of the environment. Utilizing the proposed approach, the Offline RL model employing a Multi-Layer Perceptron (MLP) achieves a testing accuracy that falls within 1 % of the results obtained by the off-policy RL agent. Moreover, we provide a practical guide with datasets, offering valuable tutorials on the application of Offline RL in Gridworld-based real-world applications. The guide can be found in this GitHub1repository.
KW - Data Distribution
KW - Deep Learning
KW - DQN
KW - Machine Learning
KW - Offline RL
KW - Tutorial
UR - http://www.scopus.com/inward/record.url?scp=85191449034&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85191449034&partnerID=8YFLogxK
U2 - 10.1109/BigComp60711.2024.00033
DO - 10.1109/BigComp60711.2024.00033
M3 - Conference contribution
AN - SCOPUS:85191449034
T3 - Proceedings - 2024 IEEE International Conference on Big Data and Smart Computing, BigComp 2024
SP - 159
EP - 166
BT - Proceedings - 2024 IEEE International Conference on Big Data and Smart Computing, BigComp 2024
A2 - Unger, Herwig
A2 - Chae, Jinseok
A2 - Lee, Young-Koo
A2 - Wagner, Christian
A2 - Wang, Chaokun
A2 - Bennis, Mehdi
A2 - Ketcham, Mahasak
A2 - Suh, Young-Kyoon
A2 - Kwon, Hyuk-Yoon
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 18 February 2024 through 21 February 2024
ER -