TY - JOUR
T1 - Gradient descent fails to learn high-frequency functions and modular arithmetic
AU - Takhanov, Rustem
AU - Tezekbayev, Maxat
AU - Pak, Artur
AU - Bolatov, Arman
AU - Assylbekov, Zhenisbek
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2025.
PY - 2025/4
Y1 - 2025/4
N2 - Classes of target functions containing a large number of approximately orthogonal elements are known to be hard to learn by the Statistical Query algorithms. Recently this classical fact re-emerged in a theory of gradient-based optimization of neural networks. In the novel framework, the hardness of a class is usually quantified by the variance of the gradient with respect to a random choice of a target function. A set of functions of the form x→axmodp, where a is taken from Zp, has attracted some attention from deep learning theorists and cryptographers recently. This class can be understood as a subset of p-periodic functions on Z and is tightly connected with a class of high-frequency periodic functions on the real line. We present a mathematical analysis of limitations and challenges associated with using gradient-based learning techniques to train a high-frequency periodic function or modular multiplication from examples. We highlight that the variance of the gradient is negligibly small in both cases when either a frequency or the prime base p is large. This in turn prevents such a learning algorithm from being successful.
AB - Classes of target functions containing a large number of approximately orthogonal elements are known to be hard to learn by the Statistical Query algorithms. Recently this classical fact re-emerged in a theory of gradient-based optimization of neural networks. In the novel framework, the hardness of a class is usually quantified by the variance of the gradient with respect to a random choice of a target function. A set of functions of the form x→axmodp, where a is taken from Zp, has attracted some attention from deep learning theorists and cryptographers recently. This class can be understood as a subset of p-periodic functions on Z and is tightly connected with a class of high-frequency periodic functions on the real line. We present a mathematical analysis of limitations and challenges associated with using gradient-based learning techniques to train a high-frequency periodic function or modular multiplication from examples. We highlight that the variance of the gradient is negligibly small in both cases when either a frequency or the prime base p is large. This in turn prevents such a learning algorithm from being successful.
KW - Barren plateau
KW - Gradient-based optimization
KW - Hardness of learning
KW - High-frequency periodic functions
KW - Modular multiplication
KW - Statistical query
UR - http://www.scopus.com/inward/record.url?scp=86000303436&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=86000303436&partnerID=8YFLogxK
U2 - 10.1007/s10994-025-06747-8
DO - 10.1007/s10994-025-06747-8
M3 - Article
AN - SCOPUS:86000303436
SN - 0885-6125
VL - 114
JO - Machine Learning
JF - Machine Learning
IS - 4
M1 - 117
ER -