The training course of a deep neural network (DNN) while fitting one-dimension (1-d) functions is nontrivial and not well understood. We design one class of 1-d functions to explore the training course of DNN. We find that different frequency components of the target function, in general, have different priorities during the training course, herein we call this phenomenon Frequency Principle (F-Principle). For a 1-d function dominated by low-frequency components, a DNN with commonly used small initialization weights, which can well fit the function after training, first quickly captures the dominated low-frequency components, and then relatively slowly captures those high-frequency components. This F-Principle holds insensitively of activation function, training algorithm, and DNN structure in our simulations. In this work, we also show that this F-Principle can be used to understand several important phenomena, i) the behavior of DNN training in the information plane; ii) the generalization of DNN. This F-Principle may provide a new direction to build effective models to characterize the training course of DNN.
Zhiqin is currently a Visiting Member in the Courant Institute of Mathematical Sciences at New York University and a Postdoc Associate at New York University Abu Dhabi, working with Prof. David W. McLaughlin in mathematical and computational neuroscience. He obtained my B.S. in Physics (Zhiyuan College) and Ph.D. degree in Mathematics from Shanghai Jiao Tong University in China under the supervision of Profs. David Cai and Douglas Zhou. His research interests lie in computational neuroscience, ranging from theoretical study and simulation to data analysis.