In this presentation, we delve into three primary areas of research. First, we explore the dynamics of the Neural Tangent Kernel (NTK) for finite-width Deep Residual Networks (ResNet) using the neural tangent hierarchy (NTH) framework introduced by Huang and Yau, and we obtain that for ResNet with smooth and Lipschitz activation function, the requirement on the layer width m with respect to the number of training samples n can be reduced from quartic to cubic.
Secondly, we focus on the intriguing observation that neural networks exhibit distinct behaviors depending on the scales of initialization. Drawing from previous research, notably the work by Luo et al., we present a phase diagram characterizing the phenomenon of initial condensation in two-layer neural networks and convolutional neural networks. Condensation refers to a situation where weight vectors in neural networks tend to concentrate on isolated orientations during the training process. This phenomenon is a key feature of nonlinear learning processes and contributes to enhancing the generalization capabilities of neural networks.
Finally, we shift our focus to the Dropout algorithm, a widely adopted regularization technique in neural network training. We embark on a rigorous theoretical derivation of stochastic modified equations, with the primary objective of providing an effective approximation for the discrete iterative process of dropout. Moreover, our empirical findings substantiate that Dropout indeed facilitates the phenomenon of condensation at the end of training stage of neural networks.
Yuqing Li is currently a Wen-Tsun Wu Assistant Professor at the School of Mathematical Sciences Shanghai Jiao Tong University. He obtained his PhD at Purdue University advised by Aaron Yip, and he received his Bachelor’s degree at the School of Mathematical Sciences Shanghai Jiao Tong University. His research interest lies in the general area of applied and computational mathematics, where mathematical tools and numerical methods are developed to address problems in understanding the training process of deep neural networks. Recently, his current research topics focus on condensation, a phenomenon wherein the weight vectors of neural networks concentrate on isolated orientations throughout the training process. It is a feature in the non-linear learning process that enables neural networks to possess better generalization abilities.
Seminar by the NYU-ECNU Institute of Mathematical Sciences at NYU Shanghai
This event is open to the NYU Shanghai community and Math community.